Closed lbeltrame closed 9 years ago
Thanks for reporting this. Just so I understand, it this a case where the input has a mixed set of records in the sense that some lines have genotypes for all samples while other lines have genotypes for only a subset of the records? In other words, the lost records are those with genotypes < total number of samples in the header?
In data venerdì 23 gennaio 2015 07:48:59, Aaron Quinlan ha scritto:
subset of the records? In other words, the lost records are those with genotypes < total number of samples in the header?
The fun thing is that while this is like you say, it is not always like this. Most of the records have incomplete genotypes and they went through fine.
I suspect there's some race or strange behavior when there are multiple processes at bay (or perhaps during merging?), because nothing of the sort occurs if I run it in a single process.
Are you still seeing this behavior?
Are you still seeing this behavior?
Unfortunately I had a deadline so I just switched everything to single-core to get the results out. Hopefully I'll try to reproduce next week.
Upon some investigation, I noticed that my grabix version is not up to date (0.1.2), which may be the cuase of these issues.
Any updates on this - were you able to attribute it to grabix? I suspect that was the cause, as there was a bug fix for this in the last release of grabix
Any updates on this - were you able to attribute it to grabix? I suspect
My guess is that's the case. Sorry for the noise!
This seems to be related to having a vcf.gz and a stale vcf.gz.gbi . I can't see checks in the code for when .gz is newer than .gbi . For now, in all cases I've found, it is resolved by removing the .gbi file. Though if I understand the issue correctly, we should have a fix shortly.
Closing.
gemini load
sometimes does not load valid VCF records.For example (annotated with snpEff: sensitive data removed):
will not be loaded without any error. In particular this is a multi-sample line where only one sample is called, the rest has no calls (resulted from a join with the GATK's CombineVariants).
There are no traces of it in the GEMINI database, created with:
Querying:
does not yield any result. In fact, there is no record with that REF and ALT in database, related to that gene and sample.
When GEMINI is run in single-core mode, the record is added instead.