genepi / nf-gwas

A nextflow pipeline to perform state-of-the-art genome-wide association studies.
https://genepi.github.io/nf-gwas
MIT License
61 stars 21 forks source link

WGS data #104

Open jjfarrell opened 5 months ago

jjfarrell commented 5 months ago

The pipeline looks like it is optimized for processing imputed vcf data from UMICH or TOPMed imputation server which generates a DS field. Is is possible to run the pipeline on GATK WGS sequencing data without the DS field. Or does that need to be calculated with the PL field and written out to plink format before running the pipeline?

aaleksandrov95 commented 4 months ago

Are there any updates on this issue? I believe I am having a similar problem in our implementation of the pipeline for GATK WGS.

seppinho commented 3 months ago

Hi, I just double checked the Regenie repo, and regenie uses either DS or GT. So in case you want to use our pipeline, you have to convert it first (e.g. with plink2). If you have a working command, I'm happy to integrate that as a step into the pipeline. I think thats useful for many!

See here: https://github.com/rgcgithub/regenie/issues/114#issuecomment-832851417

aaleksandrov95 commented 3 months ago

I got it to work by converting the VCF to BED using plink2. I also saw in several other issues, such as https://github.com/rgcgithub/regenie/issues/209, that a Oxford Sample file may help with the missing values error, which kept occurring for me, so I generated one as well, again using plink2.

The only tricky part was to keep the IID and FID consistent with the internal workings of the pipeline, but now it seems to run fine.

EDIT: Here are the PLINK2 commands for reference.

VCF-to-BED:

plink2 --vcf ${input_vcf_file} \
        --fam ${path}/samples-sex.nf_gwas.psam \
        --double-id \
        --split-par 'hg38' \
        --output-chr chrM \
        --set-all-var-ids @:#:ref\$r-alt\$a --new-id-max-allele-len 527 \
        --make-bed \
        --out ${output_path}

Making Oxford .sample file

plink2 --vcf ${input_vcf_file} \
        --fam ${path}/samples-sex.nf_gwas.psam \
        --split-par 'hg38' \
        --output-chr chrM \
        --set-all-var-ids @:#:ref\$r-alt\$a --new-id-max-allele-len 527\
        --recode oxford \
        --out ${output_path}

As mentioned, I added the oxford .sample file, because of several missing values/ invalid sample names errors, as linked in the issue above.

seppinho commented 3 months ago

Great to hear. Can you also share the commands, in case someone else is running into the same issue? Best. Sebastian