Hi all, Can you please help with my understanding here - I am having large scale INDIVIDUAL WGS VCF files - want to run the NF-GWAS pipeline on the full dataset. I have the nextflow and infra ready to handle the size of this scale. ~50 Nodes - 64 CPUS 256 GB RAM
Does the pipeline assume that the input has to be merged per chromosome for each VCF?
Also, what all preprocessing steps are recommended before giving the input to the pipeline?
For this scale do we need to use .bgen files only ? Was this scale of data tested on the VCF data for reginie to perform in the best way?
If needed to create the merged VCF - can you confirm if this is the best method :
(Each VCFs > Normalize(bcftools) > for each VCF - Pvar,Pgen,Psam > Merge to 1 - Pvar,Pgen,Psam(Plink) > Convert to bgen.
Hi all, Can you please help with my understanding here - I am having large scale INDIVIDUAL WGS VCF files - want to run the NF-GWAS pipeline on the full dataset. I have the nextflow and infra ready to handle the size of this scale. ~50 Nodes - 64 CPUS 256 GB RAM