AndersenLab / cegwas2-nf

GWA mapping with C. elegans
MIT License
8 stars 6 forks source link
nextflow

cegwas2-nf

GWA mapping with C. elegans

Overview of the workflow

alt text

Required software for running on QUEST

  1. nextflow-v20.0+

Users can either update Nextflow to the newest version to run OR load a conda environment for Nextflow v20 using the following commands:

module load python/anaconda3.6
source activate /projects/b1059/software/conda_envs/nf20_env

Required software for running outside of QUEST

These packages should be in the user's PATH

  1. R-v3.6.0
  2. nextflow-v20.0+
  3. BCFtools-v1.9
  4. plink-v1.9
  5. bedtools-2.29.2
  6. R-cegwas2
  7. R-tidyverse-v1.2.1
  8. R-coop-0.6-2
  9. R-rrBLUP-v4.6
  10. R-plotly-4.9.2
  11. R-DT-0.12
  12. R-data.table-1.12.8
  13. R-Rcpp-1.0.1
  14. R-genetics-1.3.8.1.2
  15. R-sommer-4.0.4
  16. R-RSpectra-v0.13-1
  17. pandoc=2.12
  18. R-knitr-1.28
  19. R-rmarkdown-2.1
  20. R-cowplot-1.0.0
  21. R-ggbeeswarm-v0.6

Required data for running outside of QUEST

  1. VCF(s)
    • A hard-filtered vcf containing phenotyped samples for mapping
    • A tabix-generated index hard-filtered vcf (.tbi)
    • An imputed vcf

Testing pipeline using Nextflow

Running debug mode is a good way to quickly test if your environment is set up correctly. Entire debug run should take 2-3 minutes.

git clone https://github.com/AndersenLab/cegwas2-nf.git
cd cegwas2-nf
nextflow main.nf --debug

Execution of pipeline using Nextflow

git clone https://github.com/AndersenLab/cegwas2-nf.git
cd cegwas2-nf
nextflow main.nf --traitfile <path to traitfile> --annotation bcsq [optional parameters, see below]

Profiles

Users can select from a number of profiles that each run different processes for the analysis:

Parameters

strain trait_name_1 trait_name_2
JU258 32.73 19.34
ECA640 34.065378 12.32
... ... ... 124.33
ECA250 34.096 23.1

Optional parameters

R scripts

Output Folder Structure

Analysis_Results-{Date}
  |
  ├──cegwas2_report_traitname_main.html
  ├──cegwas2_report_traitname_main.Rmd
  |
  ├──Phenotypes
      ├── strain_issues.txt
      ├── pr_traitname.tsv
  ├──Genotype_Matrix
      ├── Genotype_Matrix.tsv
      ├── total_independent_tests.txt
  ├──Mappings
      ├── Data             
          ├── traitname_processed_mapping.tsv
          ├── QTL_peaks.tsv
      ├── Plots   
          ├── traitname_manplot.pdf
          ├── traitname_pxgplot.pdf
          ├── Summarized_mappings.pdf
  ├──Fine_Mappings
      ├── Data             
          ├── traitname_snpeff_genes.tsv
          ├── traitname_qtlinterval_prLD_df.tsv
      ├── Plots   
          ├── traitname_qtlinterval_finemap_plot.pdf
          ├── traitname_qtlinterval_gene_plot.pdf
  ├──Divergent_and_haplotype
      ├──all_QTL_bins.bed
      ├──all_QTL_div.bed
      ├──div_isotype_list.txt
      ├──haplotype_in_QTL_region.txt
  ├──BURDEN
      ├── VT             
          ├── Data             
              ├── traitname.VariableThresholdPrice.assoc
          ├── Plots   
              ├── traitname_VTprice.pdf
      ├── SKAT   
          ├── Data             
              ├── traitname.Skat.assoc
          ├── Plots   
              ├── traitname_SKAT.pdf

Phenotypes folder

Genotype_Matrix folder

Mappings folder

Data
Plots

Fine_Mappings folder

Data
Plots

BURDEN folder (Contains two subfolders VT/SKAT with the same structure)

Data
Plots