Make gnotate from ExAC downloaded from Gnomad #122

Open Drdreammaerd opened 2 years ago

Drdreammaerd commented 2 years ago

Hi Brent,

I would like to make annotated file of ExAC downloaded from here
All chromosomes VCF 4.56 GiB, MD5: f2b57a6f0660a00e7550f62da2654948

The version of slivar is 0.2.7 71af7d12881ae0590c6d2a97ef2b282cc93fe7c6

I have fixed with few header problem and normalize with bcftools.

However, I still got an error message like this

> slivar version: 0.2.7 71af7d12881ae0590c6d2a97ef2b282cc93fe7c6
[slivar tsv] warning! didn't find ANN in header in ./ trying other fields
[slivar] unable to find gene field in ANN

The command I use is

slivar make-gnotate --prefix ExACv1.0 --field AF:ExAC_af_v1     $dir/

Thanks for your time,


brentp commented 2 years ago

Can you show what the header of your VCF has for ANN and CSQ?

Drdreammaerd commented 2 years ago

Header of vcf #CHROM POS ID REF ALT QUAL FILTER INFO Annotated info contains in INFO column.

I didn't fine any info stated ANN in the annotated information as well as in the meta-information.

brentp commented 2 years ago

are there other lines above that one that start with '#'? if not, this is not valid VCF and slivar wont' be able to use it. If so, can you show those lines?

Drdreammaerd commented 2 years ago

Did you mean this?

brentp commented 2 years ago

OK. If you remove this line: ##INFO=<ID=CSQ,Number=.,Type=String,Description=""> from the header, then it should work. I assume you modified that? Usually, it would contain, in Description, how to parse the CSQ field. But that's not needed for your use-case here anyway, so you can remove it completely from the vcf.

Drdreammaerd commented 2 years ago

Should I remove that line even if the variants has CSQ information like this

brentp commented 2 years ago

You can instead try this linux binary (just gunzip, chmod +x and then run as ./slivar_dev ...). slivar_dev.gz

if this works for you, I'll incorporate this into the next release.

Drdreammaerd commented 2 years ago

Hi Brent

I tried this one slivar_dev.gz

It did start to run but got cancelled when it run on Chr10

[slivar] warning: found ANN but it did not contain a description that indicated gene field. skipping
[slivar] warning: found CSQ but it did not contain a description that indicated gene field. skipping
[slivar tsv] warning! didn't find BCSQ in header in /storage1/fs1/jin810/Active/yung-chun/database/ExAC/ trying other fields
[slivar] warning: found BCSQ but it did not contain a description that indicated gene field. skipping
[slivar] kvs.len for chr10: 349209 after /storage1/fs1/jin810/Active/yung-chun/database/ExAC/
[slivar] writing 349209 encoded and 388 long values for chromosome 10
[slivar] removed 10 duplicated positions by using the value and chromosome: 10
[slivar tsv] warning! didn't find ANN in header in /storage1/fs1/jin810/Active/yung-chun/database/ExAC/ trying other fields
[slivar] warning: found ANN but it did not contain a description that indicated gene field. skipping
[slivar] warning: found CSQ but it did not contain a description that indicated gene field. skipping
[slivar tsv] warning! didn't find BCSQ in header in /storage1/fs1/jin810/Active/yung-chun/database/ExAC/ trying other fields
SIGSEGV: Illegal storage access. (Attempt to read from nil?)

May I know what is the problem here?



brentp commented 2 years ago

Hi, can you try this debug binary and share the error message? slivar_dbg.gz

Drdreammaerd commented 2 years ago

Yes, thanks for help.

here is the error I got

> slivar version: 0.2.8 6f116dfb8e416b28a55b3f46b6992ab930d46e8c
[slivar tsv] warning! didn't find ANN in header in /storage1/fs1/jin810/Active/yung-chun/database/ExAC/ trying other fields
/home/brentp/src/slivar/src/slivar.nim(249) slivar
/home/brentp/src/slivar/src/slivar.nim(246) main
/home/brentp/src/slivar/src/slivarpkg/make_gnotate.nim(264) main
/home/brentp/src/slivar/src/slivarpkg/evaluator.nim(324) newEvaluator
/home/brentp/src/slivar/src/slivarpkg/tsv.nim(114) set_csq_fields
/nim/lib/system/fatal.nim(49) sysFatal
Error: unhandled exception: index 1 not in 0 .. 0 [IndexDefect]
brentp commented 2 years ago

I see. The best way for you to proceed is to remove the CSQ, ANN, BCSQ fields from your VCF completely. Use bcftools annotate -x $field to do this.

brentp commented 2 years ago

This binary (just updated) fixes this problem for me on a similar VCF with no descriptioon of format in the CSQ field. If it works for you I'll do more testing and make a new release. slivar_dbg2.gz

Drdreammaerd commented 2 years ago

slivar_dbg2.gz works on my side. Thanks a lot.