genepi / nf-gwas

A nextflow pipeline to perform state-of-the-art genome-wide association studies.
https://genepi.github.io/nf-gwas
MIT License
63 stars 21 forks source link

Handling - Large scale Individual WGS VCF's #107

Open snehaleela opened 4 months ago

snehaleela commented 4 months ago

Hi all, Can you please help with my understanding here - I am having large scale INDIVIDUAL WGS VCF files - want to run the NF-GWAS pipeline on the full dataset. I have the nextflow and infra ready to handle the size of this scale. ~50 Nodes - 64 CPUS 256 GB RAM

  1. Does the pipeline assume that the input has to be merged per chromosome for each VCF?
  2. Also, what all preprocessing steps are recommended before giving the input to the pipeline?
  3. For this scale do we need to use .bgen files only ? Was this scale of data tested on the VCF data for reginie to perform in the best way?
  4. If needed to create the merged VCF - can you confirm if this is the best method : (Each VCFs > Normalize(bcftools) > for each VCF - Pvar,Pgen,Psam > Merge to 1 - Pvar,Pgen,Psam(Plink) > Convert to bgen.