brentp / slivar

genetic variant expressions, annotation, and filtering for great good.
MIT License
247 stars 23 forks source link

gnomAD v4 #161

Open equinne5 opened 7 months ago

equinne5 commented 7 months ago

Hi Brent,

Thanks so much for SLIVAR and all of your wonderful tools! was just wondering if there are any future plans to generate gnotation files for gnomAD v4- I understand its a huge undertaking so just wanted to see if its in the works in the future or if we should look into generating it ourselves.

Emma

wwgordon commented 7 months ago

Hi Emma,

I am not involved with slivar development but my lab is currently putting together a gnomad v4 gnotate file. Assuming it works as planned, I would be happy to share the file and/or the script we used to generate it. To keep file size manageable, we are only including a small subset of INFO fields, which may not match what you require:

##INFO=<ID=fafmax_faf95_max_joint,Number=A,Type=Float,Description="Maximum filtering allele frequency (using Poisson 95% CI) across genetic_ancestry groups in joint subset">
##INFO=<ID=fafmax_faf95_max_gen_anc_joint,Number=A,Type=String,Description="Genetic ancestry group with maximum filtering allele frequency (using Poisson 95% CI) in joint subset">
##INFO=<ID=faf95_joint,Number=A,Type=Float,Description="Filtering allele frequency (using Poisson 95% CI) in joint subset">
##INFO=<ID=nhomalt_joint,Number=A,Type=Integer,Description="Count of homozygous individuals in joint subset">

I should have this tested later this week.

Cheers, William

equinne5 commented 7 months ago

Hi William, apologies for the delayed response but thank you so much that would be brilliant if you don't mind Id love to take you up on that! It would be great to have both file/script if that's alright - you can let me know the best way to share once its ready. Thanks so much again for your kind offer! all the best, Emma

wwgordon commented 7 months ago

Hi Emma,

I started with the full gnomAD v4 release, one bgz per chrom. As mentioned we only needed 3 annotations--we dropped the fafmax_faf95_max_gen_anc_joint because it is a string and therefore can't be gnotated. So first I pulled these 3 annotations using bcftools annotate -x (I use nohup because my connection is shaky):

nohup bash -c '
module load bcftools/1.17

for file in gnomad*bgz; do
  bcftools annotate -x ^INFO/fafmax_faf95_max_joint,INFO/faf95_joint,INFO/nhomalt_joint \
    --output temp_for_gnotate/toConcat_$file.bgz $file &
done
' &

Then I just concatenated these chroms into a single bcf:

ls -v temp_for_gnotate/toConcat* | \
bcftools concat \
  --file-list /dev/stdin \
  --output temp_for_gnotate/gnomad_v4_faf95joint_allChroms.bcf \
  --output-type b

Index the single bcf:

bcftools index gnomad_v4_faf95joint_allChroms.bcf

And create the gnotate:

${SLIVAR} make-gnotate \
  --field fafmax_faf95_max_joint:gnomadV4joint_maxFAF95 \
  --field faf95_joint:gnomadV4joint_FAF95_all \
  --field nhomalt_joint:gnomadV4joint_nHomAlt \
  --prefix gnotates/gnomadV4joint \
  gnomad_v4_faf95joint_allChroms.bcf

Clean up your temp files and that's all there is to it. I'm happy to send our gnotate file to you, though I suspect you may want to tailor it to the fields you require. Just let me know!

Cheers, William

equinne5 commented 7 months ago

Thanks so much for this William - honestly you're so good for sending all of this! Its really generous . If you don't mind Im going to be cheeky and ask for your gnotate file too ( at the moment the maxFAF95 and nHomAlt ifields are plenty to work with) and it would allow me to play around a bit with things before the end of the year if you've a gnotate file ready to go but only if its not too much trouble for you. thanks again for being so helpful!

Emma

wwgordon commented 7 months ago

Sure thing, here it is (plus index):

https://storage.googleapis.com/anhinga/gnomad_v4_faf95joint_allChroms.bcf https://storage.googleapis.com/anhinga/gnomad_v4_faf95joint_allChroms.bcf.csi

Let me know if you have any problems! William

equinne5 commented 7 months ago

Thank you so much!! Really appreciate it! Take Care!

wwgordon commented 7 months ago

No problem! Looks like you've pulled the files, so I've removed public access.

Cheers, William

On Fri, Dec 22, 2023 at 5:05 AM equinne5 @.***> wrote:

Thank you so much!! Really appreciate it! Take Care!

— Reply to this email directly, view it on GitHub https://github.com/brentp/slivar/issues/161#issuecomment-1867665151, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQRLFGONO65PYEG55U442DYKWARDAVCNFSM6AAAAABAJR4WYWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXGY3DKMJVGE . You are receiving this because you commented.Message ID: @.***>

wwgordon commented 5 months ago

@equinne5 just so you're aware, there is an issue with gnomAD v4.0 AN and AF values:

https://docs.google.com/document/d/1Xm5ZIhmkh7hv2qEfCDS6J2T0IUZYiXP8pNClTlNvCGQ/edit