Open nicolerg opened 1 year ago
As far as I can tell, here is a comprehensive list of all external RData/txt files that are used but not documented:
/Volumes/SSD500GB/gtex-pipeline/expression_genotypes/GTEx_v8_noChr_noRSID.bim
/Volumes/SSD500GB/gtex-pipeline/GTEx_Analysis_v8_eQTL_covariates/
/Volumes/SSD500GB/gtex-pipeline/GTEx_Analysis_v8_eQTL_expression_matrices/
/Volumes/SSD500GB/gtex-pipeline/log2-normalized-expression/log2-normalized-expression_*.expression.bed.gz
~/data/smontgom/41467_2021_23579_MOESM6_ESM.csv
~/data/smontgom/est_gcor_mat.RData
~/data/smontgom/GENES_HUMAN.txt
~/data/smontgom/GENES_RAT.txt
~/data/smontgom/GTEx_Analysis_v8_sbgenes/signif.sbgenes.txt
~/data/smontgom/GTEx_v8_ExpressionScores/tissues/
~/data/smontgom/gwas_metadata.csv
~/data/smontgom/imputed_gwas_hg38_1.1/
~/data/smontgom/meta_analysis_results.RData
~/data/smontgom/old_dea_deseq_20201121/*_training-dea_20201121.RData
~/data/smontgom/open-targets_tissue-x-disease_*
~/data/smontgom/opentargets/associationByOverallDirect.csv
~/data/smontgom/opentargets/associationByOverallDirect.csv
~/data/smontgom/opentargets/associationByOverallIndirect.csv
~/data/smontgom/PANTHER17_human_rat_ref_genome_orthologs.tsv
~/data/smontgom/relative_expression_motrpac_gtex
~/data/smontgom/RGD_ORTHOLOGS_20201001.txt
~/data/smontgom/RSID_POS_MAP_*.txt
~/data/smontgom/zcor_transcriptome_pass1b.tsv
~/repos/ldsc/1000G_EUR_Phase3_plink/1000G.EUR.QC.*.bim
~/repos/ldsc/baseline/baseline.*.annot
~/repos/ldsc/baseline/baseline.*.annot
~/repos/ldsc/custom_genesets/cluster_*.chr_*.annot
~/repos/ldsc/ENSG_coord.txt
I think the easiest way to address this would be to add a section in the README that provides a brief description of where each file comes from (instead of every time the file is read in the scripts). There are a few RData that should probably be published on Zenodo too. Or, if it's easy, copy all of these files into a folder and publish the whole folder on Zenodo. Then change paths in the scripts to reflect the directory structure in Zenodo.
From the README, it sounds like you were already planning on doing this, but hopefully this is helpful!
Can important RData objects be published, i.e. on Zenodo, to enable reproducibility? Smaller text files can be added directly to this repository. I'll add lines as I see them.
At a minimum, the script used to generate an RData file should be included, with a comment to that effect when it's loaded; for external files/results, a comment about where they can be found should be included.