brentp / slivar

genetic variant expressions, annotation, and filtering for great good.
MIT License
251 stars 23 forks source link

SIGSEGV: Illegal storage access. (Attempt to read from nil?) #27

Closed snashraf closed 5 years ago

snashraf commented 5 years ago

Hi Brent,

I am trying to run the latest slivar version: 0.1.1 via a shell script. I am getting below error. 54 samples matched in VCF and PED to be evaluated SIGSEGV: Illegal storage access. (Attempt to read from nil?) slivar version: 0.1.1

When I am trying to run the same script with Version 0.0.5, I am not getting any error.

Do you have any idea that what I am getting this error?

Thanks Najeeb

brentp commented 5 years ago

can you show the command you are running? make sure you don't have repeated labels like:

--trio abc:some_expression ...
--trio abc:some_other_expression...
brentp commented 5 years ago

Hi Najeeb, I'd like to get this resolved, can you give more information?

snashraf commented 5 years ago

Hi Brent ,

Extremely Sorry for late reply !! Please find the attached script. and I am trying to run this as. slivar.txt

sh slivar.sh 54Samplesvcf2db.vcf.gz batch5.ped missinginped

brentp commented 5 years ago

would you try runnng the command using this binary (just gunzip and chmod +x) so we can see more debug info? slivar_debug.gz

snashraf commented 5 years ago

54 samples matched in VCF and PED to be evaluated Traceback (most recent call last) slivar.nim(200) slivar slivar.nim(197) main slivar.nim(105) expr_main gnotate.nim(71) open zipfiles.nim(194) extractFile zipfiles.nim(182) extractFile streams.nim(88) readData SIGSEGV: Illegal storage access. (Attempt to read from nil?) slivar version: 0.1.2

54 samples matched in VCF and PED to be evaluated Traceback (most recent call last) slivar.nim(200) slivar slivar.nim(197) main slivar.nim(105) expr_main gnotate.nim(71) open zipfiles.nim(194) extractFile zipfiles.nim(182) extractFile streams.nim(88) readData SIGSEGV: Illegal storage access. (Attempt to read from nil?)

On Fri, Apr 26, 2019 at 7:17 PM Brent Pedersen notifications@github.com wrote:

would you try runnng the command using this binary (just gunzip and chmod +x) so we can see more debug info? slivar_debug.gz https://github.com/brentp/slivar/files/3122143/slivar_debug.gz

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/brentp/slivar/issues/27#issuecomment-487115223, or mute the thread https://github.com/notifications/unsubscribe-auth/ABBFINYPDWZPES4VD5P3AD3PSMTJVANCNFSM4HICPXHA .

-- Syed Najeeb Ashraf

brentp commented 5 years ago

ok. that's easy. You got a zip file from an old release. If you get one linked from the latest release here: https://github.com/brentp/slivar/releases you'll be fine. Sorry for the trouble.

brentp commented 5 years ago

to clarify, by that, I mean grab the latest gnomad zip (GRCh37)

snashraf commented 5 years ago

Hi Brent,

Now I am getting below error.

slivar version: 0.1.2

54 samples matched in VCF and PED to be evaluated [slivar] message for /gpfs/projects/bioinfo/najeeb/playGround/slivar/gnomad.hg37.zip:

gnomad hg37 v2.1 [slivar] evaluating on 22 trios [slivar] 10000 1:1223697 evaluated 10000 variants in 56.9 seconds (175.8/second) [slivar] 20000 1:2283207 evaluated 10000 variants in 42.5 seconds (235.1/second) slivar.nim(200) slivar slivar.nim(197) main evaluator.nim(551) expr_main evaluator.nim(401) set_infos vcf.nim(283) get system.nim(3059) sysFatal Error: unhandled exception: index 0 not in 0 .. -1 [IndexError] slivar version: 0.1.2

brentp commented 5 years ago

Hi Najeeb, sorry for the trouble. Can you share a small part of the VCF that I can use to recreate the error?

And do you get this with the original binary that you downloaded (or only the debug binary)?

I can't see anything obvious to change just yet.

genomics-geek commented 5 years ago

@brentp - I also come across this error SIGSEGV: Illegal storage access. (Attempt to read from nil?) occasionally.

Not sure exactly why it happens, but seems to not like VCFs coming from GATK CombineVariants or after using slivar and using the --regions with a BED file to filter.

genomics-geek commented 5 years ago

if I remove all the #GATKCommandLine. lines from the VCF header the error SIGSEGV: Illegal storage access. (Attempt to read from nil?) goes away just FYI

brentp commented 5 years ago

can you give me a small VCF to recreate?

genomics-geek commented 5 years ago

Yup here here is an example VCF test.vcf.gz

brentp commented 5 years ago

thank you. how can I see the problem. I tried:

slivar_static expr --info "INFO.DP > 20" -v ~/Downloads/test.vcf.gz  > /dev/null

without problem.

genomics-geek commented 5 years ago

slivar expr --pass-only --vcf test.vcf --ped ped gives me the error.

PED file:

#family kid dad mom sex disease
F000016986  sample2 NA  NA  female  unaffected
F000016986  sample1 NA  NA  male    unaffected
F000016986  sample3 sample1 sample2 male    affected
genomics-geek commented 5 years ago

I used version 0.1.6.

brentp commented 5 years ago

if I put that in x.ped:

$ slivar_static expr --pass-only --vcf ~/Downloads/test.vcf.gz --ped x.ped > /dev/null 
slivar version: 0.1.6 a135a8703f055b1b9a08632ce0d8887f174e7787
[pedfile] paternal_id: "NA" referenced for sample "sample2" not found
[pedfile] maternal_id: "NA" referenced for sample "sample2" not found
[pedfile] paternal_id: "NA" referenced for sample "sample1" not found
[pedfile] maternal_id: "NA" referenced for sample "sample1" not found
[slivar] 3 samples matched in VCF and PED to be evaluated
[slivar] Finished. evaluated 4 total variants and wrote 4 variants that passed your slivar expressions.

so, I'm not able to recreate.

genomics-geek commented 5 years ago

that's odd, I get:

[svcdgdbfx@l-0-01 execution]$ /mnt/isilon/dgd_public/clin-air/v2.0.0/tools/slivar/0.1.6/slivar expr --vcf test.vcf --ped ped 
slivar version: 0.1.6 a135a8703f055b1b9a08632ce0d8887f174e7787
[pedfile] paternal_id: "NA" referenced for sample "sample2" not found
[pedfile] maternal_id: "NA" referenced for sample "sample2" not found
[pedfile] paternal_id: "NA" referenced for sample "sample1" not found
[pedfile] maternal_id: "NA" referenced for sample "sample1" not found
[slivar] 3 samples matched in VCF and PED to be evaluated
SIGSEGV: Illegal storage access. (Attempt to read from nil?)
[svcdgdbfx@l-0-01 execution]$ 
brentp commented 5 years ago

what is your test.vcf?

genomics-geek commented 5 years ago

well, when it is uncompressed I get the error. I had to bgzip it when I uploaded to github haha.

brentp commented 5 years ago

oh. wow. I get the error on the uncompressed, but not compressed. will fix.

brentp commented 5 years ago

actually, this is already fixed in my dev branch will be out in next release. thanks for reporting.

genomics-geek commented 5 years ago

Thank you! Yea it was helpful to try to figure out why it was happening. What eventually fixed it for me was to compress VCF or to remove GATKCommandLine.

Tayazseven commented 4 years ago

@brentp Hi Brent, Following up on this issue. We are using slivar v0.1.10 and see this issue.

> slivar version: 0.1.10 917aa1a61bd0c0ba50521ea4146a3a4dc45b8b64
[pedfile] paternal_id: "NA" referenced for sample "F000073286_DGD-20-4514_WESTRIO_M" not found
[pedfile] maternal_id: "NA" referenced for sample "F000073286_DGD-20-4514_WESTRIO_M" not found
[pedfile] paternal_id: "NA" referenced for sample "F000073286_DGD-20-4516_WESTRIO_F" not found
[pedfile] maternal_id: "NA" referenced for sample "F000073286_DGD-20-4516_WESTRIO_F" not found
[slivar] 3 samples matched in VCF and PED to be evaluated
SIGSEGV: Illegal storage access. (Attempt to read from nil?)

But the thing is, I cannot recreate this issue since whenever I run it again, it passes through. We pass compressed vcf to slivar expr Do you have any suggestions? My resource allocation:

cpu: 1 
mem: 12
brentp commented 4 years ago

can you share a small vcf and ped to recreate? you can try replacing "NA" with "0" or "-9" since those are accepted missing values in a pedigree file.

brentp commented 4 years ago

@Tayazseven can you make sure you have latest release (version 0.1.12).

Tayazseven commented 4 years ago

Recreating is the problem. It happens so on and off which I cannot re-create the issue. I will update it to the latest and let you know if it fixes the problem.

brentp commented 4 years ago

OK. I'd like to get this resolved. If you see it again with latest version, I can give you a debug version of slivar that will show exactly where in the code the error is occurring so I'll be better able to fix.

Phillip-a-richmond commented 3 years ago

Just want to say that I did experience this issue as well:

Command:

$SLIVAR/slivar.exe expr --vcf Test_bcftoolsnorm.bcftoolsfilter.vcf.gz \
    --ped $PED \
    --pass-only \
    -g $SLIVAR/gnomad.hg38.genomes.v3.fix.zip \
    --info "INFO.gnomad_popmax_af < $AF_CUTOFF && INFO.gnomad_nhomalt < $HOM_ALT_CUTOFF && variant.FILTER == \"PASS\" && variant.ALT[0] != \"*\"" \
    --js $SLIVAR/slivar/js/slivar-functions.js \
    -o $INVCF_NORM_FILTER_SLIVARRARE

Error:

> slivar version: 0.2.1 ceb97b26cd39d341dd7aa96ddb42239692df5b50
[slivar] 3 samples matched in VCF and PED to be evaluated
[slivar] message for /mnt/common/Precision/Slivar//gnomad.hg38.genomes.v3.fix.zip:
   > created on:2019-11-15
SIGSEGV: Illegal storage access. (Attempt to read from nil?)

The issue was with the VCF output from DeepVariant, that has these RefCall lines which don't play by the VCF rules. Removing the RefCall lines from the VCF fixed the issue:

zgrep -v "RefCall"  Test_bcftoolsnorm.bcftoolsfilter.vcf.gz > Test_bcftoolsnorm.bcftoolsfilter.NoRefCall.vcf.gz

Then, rerunning:

$SLIVAR/slivar.exe expr --vcf Test_bcftoolsnorm.bcftoolsfilter.NoRefCall.vcf.gz  \
    --ped $PED \
    --pass-only \
    -g $SLIVAR/gnomad.hg38.genomes.v3.fix.zip \
    --info "INFO.gnomad_popmax_af < $AF_CUTOFF && INFO.gnomad_nhomalt < $HOM_ALT_CUTOFF && variant.FILTER == \"PASS\" && variant.ALT[0] != \"*\"" \
    --js $SLIVAR/slivar/js/slivar-functions.js \
    -o $INVCF_NORM_FILTER_SLIVARRARE

Produces happy output:

> slivar version: 0.2.1 ceb97b26cd39d341dd7aa96ddb42239692df5b50
[slivar] 3 samples matched in VCF and PED to be evaluated
[slivar] message for /mnt/common/Precision/Slivar//gnomad.hg38.genomes.v3.fix.zip:
   > created on:2019-11-15
[slivar] 10000 1:4544866 evaluated 10000 variants in 7.2 seconds (1396.6/second)
[slivar] 20000 1:8872353 evaluated 10000 variants in 0.1 seconds (80858.6/second)
[slivar] 100000 1:45816411 evaluated 100000 variants in 1.0 seconds (99968.1/second)
[slivar] 200000 1:94696627 evaluated 100000 variants in 1.2 seconds (81136.8/second)
[slivar] 300000 1:163816028 evaluated 100000 variants in 1.2 seconds (80075.2/second)
[slivar] 400000 1:210863973 evaluated 100000 variants in 1.2 seconds (81417.4/second)

Perhaps that may fix the issue for the next person who stumbles upon of this error.

Cheers, Phil

brentp commented 3 years ago

@Phillip-a-richmond can you link a VCF with a RefCall variant that does have this problem?

Phillip-a-richmond commented 3 years ago

Okay so while trying to isolate the issue to the RefCall, I was unsuccessful. It turns out the issue that I solved with my zgrep for RefCall, was actually just because I uncompressed from vcf.gz to plain text, and then reran the slivar command on the uncompressed VCF.

If I had to say where the issue is exactly, my guess is that it has something to do with the bgzipped VCF I'm using being read into slivar.

So what I can reproduce with my files:

#SLIVAR 
$SLIVAR/slivar.exe expr --vcf Test_bcftoolsnorm.bcftoolsfilter.10000.vcf.gz \
    --ped Test.ped \
    --pass-only \
    -g $SLIVAR/gnomad.hg38.genomes.v3.fix.zip \
    --info "INFO.gnomad_popmax_af < $AF_CUTOFF && INFO.gnomad_nhomalt < $HOM_ALT_CUTOFF && variant.FILTER == \"PASS\" && variant.ALT[0] != \"*\"" \
    --js $SLIVAR/slivar/js/slivar-functions.js \
    -o $INVCF_NORM_FILTER_SLIVARRARE

Error:

> slivar version: 0.2.1 ceb97b26cd39d341dd7aa96ddb42239692df5b50
[slivar] 3 samples matched in VCF and PED to be evaluated
[slivar] message for /mnt/common/Precision/Slivar//gnomad.hg38.genomes.v3.fix.zip:
   > created on:2019-11-15
SIGSEGV: Illegal storage access. (Attempt to read from nil?)

But this code does not give an error:

#SLIVAR 
$SLIVAR/slivar.exe expr --vcf EPGEN055_bcftoolsnorm.bcftoolsfilter.10000.vcf \
    --ped Test.ped \
    --pass-only \
    -g $SLIVAR/gnomad.hg38.genomes.v3.fix.zip \
    --info "INFO.gnomad_popmax_af < $AF_CUTOFF && INFO.gnomad_nhomalt < $HOM_ALT_CUTOFF && variant.FILTER == \"PASS\" && variant.ALT[0] != \"*\"" \
    --js $SLIVAR/slivar/js/slivar-functions.js \
    -o $INVCF_NORM_FILTER_SLIVARRARE

I'm attaching here the uncompressed and compressed VCFs, with least 10000 lines in each of them. Hopefully that can help your debugging process.

Note, change the vcf.txt back to just vcf...github won't let me paste .

brentp commented 3 years ago

thanks very much @Phillip-a-richmond , I will have a look.

fakedrtom commented 3 years ago

Not sure if it is for the same reasons, but I have a VCF that is generating the same error that I could point you to if that would be helpful.

brentp commented 3 years ago

I can get this with the last release. With dev version, it seems to be fixed--though I'm not sure which change is to blame/praise for that. @Phillip-a-richmond and @fakedrtom and anyone else who has had this issue, would you try the attached binary (should show v0.2.2 when run) and verify that the problem is resolved?

If so, I'll make a new release soon. Thanks.

brentp commented 3 years ago

... and now with the binary attached.

slivar.gz

Phillip-a-richmond commented 3 years ago

That binary didn't fix the problem for me.

> slivar version: 0.2.2 0401ef029e3b2dee542c56c35d1dae3a8c245dfb
[slivar] 3 samples matched in VCF and PED to be evaluated
[slivar] message for /mnt/common/Precision/Slivar//gnomad.hg38.genomes.v3.fix.zip:
   > created on:2019-11-15
SIGSEGV: Illegal storage access. (Attempt to read from nil?)

The VCF is bgzip'd with tabix 1.10.2 and indexed.

On the uncompressed VCF:

> slivar version: 0.2.2 0401ef029e3b2dee542c56c35d1dae3a8c245dfb
[slivar] 3 samples matched in VCF and PED to be evaluated
[slivar] message for /mnt/common/Precision/Slivar//gnomad.hg38.genomes.v3.fix.zip:
   > created on:2019-11-15
[slivar] Finished. evaluated 9785 total variants and wrote 23 variants that passed your slivar expressions.

Is there a verbose/debug option you've got working on a binary you can share? Perhaps we can pinpoint where the complaint is on my end?

Cheers, -Phil

brentp commented 3 years ago

@Phillip-a-richmond , can you give the command that generates the error with the files that you posted above?

raungar commented 3 years ago

Hi-

Has this problem been solved? I am trying to implement slivar for patient data, so unfortunately I cannot share any VCFs. I have tried both with unzipped and bgzipped files, the most recent build, and the binary shared above.

I get this error after passing my vcf through bcftools csq

Thanks, Rachel

Phillip-a-richmond commented 3 years ago

I may have dropped the ball on this one. Raungar the error went away for me when I cut out the RefCall lines from my VCF. Below is a functional pipeline starting with a merged VCF from deeptrio-->glnexus-->bcftools (merged.vcf.gz)

INVCF=${SAMPLE}.merged.vcf.gz
INVCF_NORM=${SAMPLE}_bcftoolsnorm.vcf.gz
INVCF_NORM_FILTER=${SAMPLE}_bcftoolsnorm.bcftoolsfilter.vcf.gz
INVCF_NORM_FILTER_NOREFCALL=${SAMPLE}_bcftoolsnorm.bcftoolsfilter.NoRefCall.vcf.gz
INVCF_NORM_FILTER_NOREFCALL_SLIVARRARE=${SAMPLE}_bcftoolsnorm.bcftoolsfilter.NoRefCall.slivarrare.vcf.gz

# Cutoffs
HOM_ALT_CUTOFF=15
AF_CUTOFF=0.001

# Step 1 - BCFTools normalize & apply filter
# BCFTOOLS
# Normalize VCF
$BCFTOOLS norm \
    -f $GENOME_DIR/$GENOME_FASTA \
    --threads @NSLOTS \
    -m - \
    -O z \
    --output $INVCF_NORM \
    $INVCF 

$TABIX -f $INVCF_NORM

# Filter normalized VCF for high depth + low alt allele support (soft-filter)
$BCFTOOLS filter \
         --include 'FORMAT/AD[*:1]>=5 && FORMAT/DP[*] < 600' \
         -m + \
         -s + \
         -O z \
         --output $INVCF_NORM_FILTER \
         $INVCF_NORM

$TABIX -f $INVCF_NORM_FILTER

# Step 2 - Get rid of RefCall from DeepVariant output
zgrep -v "RefCall" $INVCF_NORM_FILTER | bgzip -c > $INVCF_NORM_FILTER_NOREFCALL
tabix $INVCF_NORM_FILTER_NOREFCALL

# Step 3 - Slivar filter for gnomad frequency, here set to GRCh38
#SLIVAR 
$SLIVAR/slivar.exe expr --vcf $INVCF_NORM_FILTER_NOREFCALL \
    --ped $PED \
    --pass-only \
    -g $SLIVAR/gnomad.hg38.genomes.v3.fix.zip \
    --info "INFO.gnomad_popmax_af < $AF_CUTOFF && INFO.gnomad_nhomalt < $HOM_ALT_CUTOFF && variant.FILTER == \"PASS\" && variant.ALT[0] != \"*\"" \
    --js $SLIVAR/slivar/js/slivar-functions.js \
    -o $INVCF_NORM_FILTER_NOREFCALL_SLIVARRARE

I recommend grabbing an open source trio to share your errors. Plenty available, but I test with this one:

############
# EUROPEAN #
############
## HG01771 - Family IBS049 - EUR - IBS - FATHER 
wget -c -q ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR195/009/ERR1955499/ERR1955499_1.fastq.gz
wget -c -q ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR195/009/ERR1955499/ERR1955499_2.fastq.gz
## HG01770 - Family IBS049 - EUR - IBS - MOTHER
wget -c -q ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR195/005/ERR1955435/ERR1955435_1.fastq.gz
wget -c -q ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR195/005/ERR1955435/ERR1955435_2.fastq.gz
##  HG01772 - Family IBS049 - EUR - IBS - PROBAND (F)
wget -c -q ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR230/005/ERR2304565/ERR2304565_1.fastq.gz
wget -c -q ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR230/005/ERR2304565/ERR2304565_2.fastq.gz
brentp commented 3 years ago

Hi @raungar , I'd like to get this resolved, but I have not been able to recreate. Can you try running the same command with this binary attached and let me know what the full output is?

@Phillip-a-richmond if you can show me how to recreate, I'll look into it. Those header lines should not affect slivar. I am curious what a few of the RefCall lines look like so if you can paste those perhaps that will also give a clue.

brentp commented 3 years ago

@raungar binary attached this time: slivar_debug.gz

(just download, gunzip, chmod +x slivar_debug, thne use slivar_debug instead of slivar).

brentp commented 3 years ago

I think I found the problem. Would you try this binary as well? slivar_debug_fix.gz

JakeHagen commented 3 years ago

Hi Brent and everyone,

I believe I am having the same issue. First, those two binaries that you just posted did not solve the issue for me. But I do have some info that might help.

Oddly this error will occur most of the time, but when repeatedly trying the command, it will rarely run successfully (maybe 10% of the time).

For me at least, removing the RNC format field from glnexus VCFs seems to fix the error. From what I can tell the RNC format field does not follow the VCF spec. Are we all trying on glnexus vcfs?

Attached are vcfs created using glnexus "get started page" and then removed the FORMAT/RNC

dv_1000G_ALDH2.vcf.gz dv_1000G_ALDH2.rmRNC.vcf.gz

brentp commented 3 years ago

@JakeHagen ok. this is a great start. Now, can you share a command that generates the sigsegv? I tried:

slivar_debug_fix expr -o x.vcf -v ~/Downloads/dv_1000G_ALDH2.vcf.gz --sample-expr "s:sample.AD[0] > 1" --alias <(echo "")

but it completes without issue (though it does give warning about multiallelics)

brentp commented 3 years ago

The RNC field is valid VCF so something else is going on. If I can recreate the bug then I should be able to fix, but I don't see it yet. @JakeHagen does the slivar_debug_fix above resolve the issue for you?

raungar commented 3 years ago

Hi-

I figured out the problem for my VCF at least. This was purely a header issue for me -- I didn't have all the FORMAT tags in my header, which gave warnings but no errors. Adding in the proper FORMAT tags (ex: ##FORMAT=) solved this problem for me.

Thanks, Rachel

JakeHagen commented 3 years ago

@brentp Hi Brent, the command you used above works fine for me. Below is the command that errors for me.

./slivar_debug_fix expr -v glnexus/dv_1000G_ALDH2.vcf.gz --gnotate tmp/gnomad.hg38.genomes.v3.fix.zip | bgzip -c > t.vcf.gz

[slivar] 2504 samples matched in VCF and PED to be evaluated
[slivar] message for tmp/gnomad.hg38.genomes.v3.fix.zip:
   > created on:2019-11-15
Traceback (most recent call last)
/home/brentp/src/slivar/src/slivar.nim(249) slivar
/home/brentp/src/slivar/src/slivar.nim(246) main
/home/brentp/src/slivar/src/slivar.nim(110) expr_main
/home/brentp/src/slivar/src/slivarpkg/evaluator.nim(443) id2names
SIGSEGV: Illegal storage access. (Attempt to read from nil?)

The error usually only comes up when using --gnotate

JakeHagen commented 3 years ago

Also, now that you confirmed RNC is valid (wasn't familiar with character fields), I tried removing other format fields. I have tried removing DP, and AD, and both times the above command works. So I really don't know what is happening.

brentp commented 3 years ago

I can recreate with the command above. I will have a fix out in next release soon. Thanks for the example @JakeHagen

JakeHagen commented 3 years ago

Thats great, thank you