arq5x / gemini

a lightweight db framework for exploring genetic variation.
http://gemini.readthedocs.org
MIT License
319 stars 120 forks source link

Add CADD InDels #833

Open oleraj opened 7 years ago

oleraj commented 7 years ago

It would be nice to include indels with CADD scores in the gemini annotation pipeline.

http://krishna.gs.washington.edu/download/CADD/v1.3/InDels.tsv.gz

zcat InDels.tsv.gz | head
## CADD v1.3 (c) University of Washington and Hudson-Alpha Institute for Biotechnology 2013-2015. All rights reserved.
#Chrom  Pos Ref Alt RawScore    PHRED
1   10001   T   TC  0.035310    2.933
1   10009   A   AC  -0.440505   0.308
1   10012   C   CT  0.063038    3.217
1   10013   T   TA  -0.253866   0.842
1   10015   A   AC  -0.438485   0.312
1   10021   A   AC  -0.439836   0.309
1   10027   A   AC  -0.434172   0.319
1   10033   A   AC  -0.458178   0.279

zcat InDels.tsv.gz | grep -v ^# | wc -l
 19995943
brentp commented 5 years ago

I'll gladly accept a PR for this after the next release. If this is needed, you should be able to use gemini annotate to add these.