NikVetr / MoTrPAC_Complex_Traits

code for paper "the impact of exercise on gene regulation in association with complex trait genetics"
3 stars 0 forks source link

publish RData objects #2

Open nicolerg opened 1 year ago

nicolerg commented 1 year ago

Can important RData objects be published, i.e. on Zenodo, to enable reproducibility? Smaller text files can be added directly to this repository. I'll add lines as I see them.

At a minimum, the script used to generate an RData file should be included, with a comment to that effect when it's loaded; for external files/results, a comment about where they can be found should be included.

nicolerg commented 1 year ago

As far as I can tell, here is a comprehensive list of all external RData/txt files that are used but not documented:

/Volumes/SSD500GB/gtex-pipeline/expression_genotypes/GTEx_v8_noChr_noRSID.bim
/Volumes/SSD500GB/gtex-pipeline/GTEx_Analysis_v8_eQTL_covariates/
/Volumes/SSD500GB/gtex-pipeline/GTEx_Analysis_v8_eQTL_expression_matrices/
/Volumes/SSD500GB/gtex-pipeline/log2-normalized-expression/log2-normalized-expression_*.expression.bed.gz
~/data/smontgom/41467_2021_23579_MOESM6_ESM.csv
~/data/smontgom/est_gcor_mat.RData
~/data/smontgom/GENES_HUMAN.txt
~/data/smontgom/GENES_RAT.txt
~/data/smontgom/GTEx_Analysis_v8_sbgenes/signif.sbgenes.txt
~/data/smontgom/GTEx_v8_ExpressionScores/tissues/
~/data/smontgom/gwas_metadata.csv
~/data/smontgom/imputed_gwas_hg38_1.1/
~/data/smontgom/meta_analysis_results.RData
~/data/smontgom/old_dea_deseq_20201121/*_training-dea_20201121.RData
~/data/smontgom/open-targets_tissue-x-disease_*
~/data/smontgom/opentargets/associationByOverallDirect.csv
~/data/smontgom/opentargets/associationByOverallDirect.csv
~/data/smontgom/opentargets/associationByOverallIndirect.csv
~/data/smontgom/PANTHER17_human_rat_ref_genome_orthologs.tsv
~/data/smontgom/relative_expression_motrpac_gtex
~/data/smontgom/RGD_ORTHOLOGS_20201001.txt
~/data/smontgom/RSID_POS_MAP_*.txt
~/data/smontgom/zcor_transcriptome_pass1b.tsv
~/repos/ldsc/1000G_EUR_Phase3_plink/1000G.EUR.QC.*.bim
~/repos/ldsc/baseline/baseline.*.annot
~/repos/ldsc/baseline/baseline.*.annot
~/repos/ldsc/custom_genesets/cluster_*.chr_*.annot
~/repos/ldsc/ENSG_coord.txt

I think the easiest way to address this would be to add a section in the README that provides a brief description of where each file comes from (instead of every time the file is read in the scripts). There are a few RData that should probably be published on Zenodo too. Or, if it's easy, copy all of these files into a folder and publish the whole folder on Zenodo. Then change paths in the scripts to reflect the directory structure in Zenodo.

nicolerg commented 1 year ago

From the README, it sounds like you were already planning on doing this, but hopefully this is helpful!