Open capemaster opened 10 years ago
Hi,
most likely it is due to a low (zero?) coverage in a region. If you look at that warning in dec.log
it complains that there are no reads in that window. Try viewing the region with samtools tview
, maybe stop the haplotype reconstruction before that. Give it a try and let me know, please.
Best. O
Dear, I have tried shortening the window in the zone where reads are present and went just fine. I suggest to implement some type of control of this problem. Thank you for the advice, BEST
Hi, thanks for checking and reporting it here. I agree, it's a good idea to implement a control. I will keep this issue open until then.
Hi,
i'm facing a similar problem as "capemaster", just that in my case i first get a set of SegmentationFaults I assume they come form a compiled C program that dec.py calls. More interestingly after A SET of SegmentationFaults (number of which appears to vary with window size: smaller window, more SegFaults i get), the program then terminates with the same error message as with capemaster. Also my dec.log ends the same as his.
Any ideas what's going on and how to fix it?
Thanks in advance!
(Using Linux mint 17 - same as Ubuntu, installed gsl via the apt-get method)
Hi,
I just tried with more suitable data ... (Higher coverage and so on...) I was still getting the same error, then I shortened the window size and restricted the region of interest (ROI) and it worked out well ... I suppose the uncovered parts were problematic. Whenever a region with coverage of about 10 (maybe somewhat higher - it's an over the thumb estimate) or less is present in the ROI, shorah will throw me a segfault and later the error that "capemaster" shows.
Do you have it written down somewhere what min coverage of a region shorah will accept?
Thanks and best regards, Tomaž
Hi. No, it's not written anywhere because it's hard to tell. Shorah was developed having in mind coverage of thousands or more, the initial goal was to detect variants down to 0.1%, over short regions. With coverage of 100 it should still work fine, but it would have unexpected behaviour if coverage goes up and down wildly. Further, one should wonder why the coverage behaves like that.
Keep also in mind that global hapotype reconstruction is hard, it makes little sense if the coverage is so low, and other tools are better suited (Quasirecomb, Haplotyper, PredictHaplo). Give also a look at this paper.
Hi again,
the overall coverage is much higher (between 200 and 2000), the sequence however is rather long (~190kb) and contains repeats; the regions with lower coverage are identical sequential repeats, that got piled up in region ie. 1000-1500 but not in 1500-2000 ; there is a pretty harsh overmapping one part of the repeats and an undermapping in the second part. This is why such low coverage. Thank you for the paper suggestions; will read.
Best, T
190 kb is way too long. Repeats don't make me feel better, also. These methods work under the assumption of uniformly spread variation, such that every region covered by a read length will display some variant.
Yes it it farfetched; so far the pipeline is working on a repeat-masked version... Most likely I will see a certain degree of nonsense soon enough... :)
Also, based on the vcf I expect to see 2 to 16 major variants. I want to see what shorah will produce and if it makes no sense whatsoever, i will use an alternative strategy...
Hi, this is the error i get from the terminal:
The tail of the dec.log is reported.
I got the same error in many platforms. I successully analysed another sample created in the way. What is going on?