brentp / somalier

fast sample-swap and relatedness checks on BAMs/CRAMs/VCFs/GVCFs... "like damn that is one smart wine guy"
MIT License
262 stars 35 forks source link

Failing to run find-sites #121

Closed johanneskoester closed 1 year ago

johanneskoester commented 1 year ago

I get the following error

$ somalier find-sites
...
[somalier] af not found, using 0
[somalier] af not found, using 0
[somalier] af not found, using 0
[somalier] af not found, using 0
[somalier] af not found, using 0
[somalier] af not found, using 0
fatal.nim(49)            sysFatal
Error: unhandled exception: index out of bounds, the container is empty [IndexDefect]

The test file is too large to upload here, but I am happy to send it via gdrive if you need it.

johanneskoester commented 1 year ago

It contains the AF tag, but not for all records. The error happens after many records have been processed.

brentp commented 1 year ago

Hi @johanneskoester , would you run with the attached debug binary and let me know the error? somalier_debug.gz find-sites is less widely used so you're likely hitting something I haven't considered.

Do note that there is a --min-AN argument which defaults to 115000. You may need to lower that if you have a smaller cohort.

johanneskoester commented 1 year ago
/home/brentp/src/somalier/src/somalier.nim(276) somalier
/home/brentp/src/somalier/src/somalier.nim(263) main
/home/brentp/src/somalier/src/somalierpkg/findsites.nim(162) findsites_main
/nim-1.6.6/lib/system/fatal.nim(53) sysFatal
Error: unhandled exception: index out of bounds, the container is empty [IndexDefect]
johanneskoester commented 1 year ago

The --min-AN has no influence, but looking at the help, maybe my input VCF does not satisfy the requirements. It has the AF field, but e.g. no samples. It is the "known variation VCF" from ensembl.org (https://ftp.ensembl.org/pub/release-110/variation/vcf/homo_sapiens/, merged together those individual chromosome files) that I have modified with bcftools annotate in order to rename the MAF field into AF (bcftools annotate -c INFO/AF:=INFO/MAF).

brentp commented 1 year ago

ok. I see the problem, it's a classic :( . I am checking variant.ALT[0] and you have a variant without an alternate allele. I will check for this.

brentp commented 1 year ago

Here is a debug build with a fix for that if you'd like to try it. somalier_debug.gz

brentp commented 1 year ago

I will also run it on chr1 from your link and assure that it works

brentp commented 1 year ago

I run:

/somalier_debug find-sites --AF-field MAF homo_sapiens-chr1.vcf.gz --min-AN 0

and see:

[somalier] af not found, using 0 # many times!!!
121649 candidate variants
sorted and filtered to 14385 autosomal variants. now dropping INFOs and writing
[somalier] wrote 14385 variants to:sites.vcf.gz

So I think that change should resolve your issue. I'll make a new release and try to reduce the number of times we see that message.

brentp commented 1 year ago

This is out in v0.2.18: https://github.com/brentp/somalier/releases/tag/v0.2.18

thanks for reporting and let me know if you have any more issues.

johanneskoester commented 1 year ago

Thanks a lot!!! Super quick!