NikVetr / MoTrPAC_Complex_Traits

code for paper "the impact of exercise on gene regulation in association with complex trait genetics"
3 stars 0 forks source link

The impact of exercise on gene regulation in association with complex trait genetics

This is the GitHub repository associated with the publication:

Vetr, Nikolai G., Nicole R. Gay, MoTrPAC Study Group, and Stephen B. Montgomery. "The Impact of Exercise on Gene Regulation in Association with Complex Trait Genetics." Nature Communications 15, no. 1 (May 1, 2024). https://doi.org/10.1038/s41467-024-45966-w

It primarily hosts scripts written in the R programming language, but interfacing with several other languages, that were used to carry out all analyses and generate all of the paper figures. Upon publication, it will also host the LaTeX files used to generate the final draft version of the paper. The names of all figure-generating scripts are given in figure order in /scripts/figures/fig*_*. Analysis scripts should be run first to generate larger intermediate files and are contained in /scripts/analyses/analysis_*. Software and data dependencies are listed below. Smaller data dependencies are also provided in the /data/internal/ folder, where possible.

This repository contains paper main and supplementary figures (/figures/), MCMC output (/data/MCMC_output), and supplemental files (/supplemental_files/) generated by these scripts.

In the interests of reproducibility, we've endeavored for these scripts function as end-to-end pipelines, capable of taking users from public data downloads to final figures and paper, at least on the assumption that all dependencies have been properly installed. However, just because everything runs from fresh install on my machine (16" 2019 MBP, macOS 10.15.7) does not mean it will run seamlessly on yours. Please contact me at nikgvetr@stanford.edu if you encounter difficulties and I'll do my best to help.

GitHub is limited in how much storage can be devoted to each repository, but the total filesize of all files used in this paper sums to >0.5TB. To download these files, I've separated /data/internal/ from /data/external/, and provide information below on where to obtain larger files from their original sources.

I personally stored this repository on an external drive, opting for simplicity to hard-code all paths at /Volumes/2TB_External/MoTrPAC_Complex_Traits/. As you are unlikely to store these files on my external drive, please make sure to modify the included scripts prior to use. One way to perform this modification leverages find and sed. In terminal, navigate to the scripts subdirectory in wherever you've cloned this repository to:

cd /Path/To/Your/Directory/MoTrPAC_Complex_Traits/scripts/

then, run the following command, taking care to appropriately escape forward-slashes:

find . -type f -exec sh -c 'LANG=C sed -i "" "s/\/Volumes\/2TB_External\//\/Path\/To\/Your\/Directory\//g" "$0"' {} \;

This should replace all instances of /Volumes/2TB_External/ with the appropriate location on your machine.

Dependencies

R (4.0.4) packages used in these scripts include:

External software and command-line tools

Please follow the installation instructions at the above links to install these software. Please clone all GitHub repos into /data/external/, or else modify these scripts accordingly.

Datasets

The following external files are used by these scripts. Please download them and place them in /data/external/.

As a reminder, scripts for generating particular figures will sometimes rely on files generated previously by analyses scripts. Unfortunately, filesizes for these sometimes exceeded maximum storage allowances on open repositories such as Zenodo. I've tried to organize these in a manner where those intermediate files are dynamically generated during execution of the script, but if something is not cooperating, please do not hesitate to reach out to me at nikgvetr@stanford.edu for assistance!