cbg-ethz / shorah

Repo for the software suite ShoRAH (Short Reads Assembly into Haplotypes)
GNU General Public License v3.0
41 stars 14 forks source link

Error in analysing virus data #4

Open capemaster opened 10 years ago

capemaster commented 10 years ago

Hi, this is the error i get from the terminal:

Traceback (most recent call last):
  File "./shorah.py", line 142, in <module>
    keep_files=options.k, alpha=options.a)
  File "/home/capemaster/Desktop/SHORAH/shorah-master/dec.py", line 399, in main
    proposed[beg] = (get_prop(dbg_file), j)
  File "/home/capemaster/Desktop/SHORAH/shorah-master/dec.py", line 252, in get_prop
    return prop
UnboundLocalError: local variable 'prop' referenced before assignment

The tail of the dec.log is reported.

DEBUG 2014-04-16 12:06:05,696                          run_dpm 178 run  -i w-JX480631-7706-7906.reads.fas -j 54405 -t 10881 -a 0.100000 -K 20 finished
DEBUG 2014-04-16 12:06:05,696                          run_dpm 179 Child /home/capemaster/Desktop/SHORAH/shorah-master/diri_sampler returned 0
DEBUG 2014-04-16 13:30:21,533                          run_dpm 178 run  -i w-JX480631-7840-8040.reads.fas -j 56490 -t 11298 -a 0.100000 -K 20 finished
DEBUG 2014-04-16 13:30:21,534                          run_dpm 179 Child /home/capemaster/Desktop/SHORAH/shorah-master/diri_sampler returned 0
DEBUG 2014-04-16 13:30:21,534                          run_dpm 170 /home/capemaster/Desktop/SHORAH/shorah-master/diri_sampler -i w-JX480631-7907-8107.reads.fas -j 52110 -t 10422 -a 0.100000 -K 20
DEBUG 2014-04-16 14:41:19,600                          run_dpm 178 run  -i w-JX480631-7907-8107.reads.fas -j 52110 -t 10422 -a 0.100000 -K 20 finished
DEBUG 2014-04-16 14:41:19,601                          run_dpm 179 Child /home/capemaster/Desktop/SHORAH/shorah-master/diri_sampler returned 0
INFO 2014-04-16 14:41:19,601                          main 392 reading windows for start position 135
WARNING 2014-04-16 14:41:19,602                          correct_reads 227 No reads in window 135?
INFO 2014-04-16 14:41:19,602                          main 396 this is window w-JX480631-135-335

I got the same error in many platforms. I successully analysed another sample created in the way. What is going on?

ozagordi commented 10 years ago

Hi, most likely it is due to a low (zero?) coverage in a region. If you look at that warning in dec.log it complains that there are no reads in that window. Try viewing the region with samtools tview, maybe stop the haplotype reconstruction before that. Give it a try and let me know, please.

Best. O

capemaster commented 10 years ago

Dear, I have tried shortening the window in the zone where reads are present and went just fine. I suggest to implement some type of control of this problem. Thank you for the advice, BEST

ozagordi commented 10 years ago

Hi, thanks for checking and reporting it here. I agree, it's a good idea to implement a control. I will keep this issue open until then.

fifthguy commented 8 years ago

Hi,

i'm facing a similar problem as "capemaster", just that in my case i first get a set of SegmentationFaults I assume they come form a compiled C program that dec.py calls. More interestingly after A SET of SegmentationFaults (number of which appears to vary with window size: smaller window, more SegFaults i get), the program then terminates with the same error message as with capemaster. Also my dec.log ends the same as his.

Any ideas what's going on and how to fix it?

Thanks in advance!

(Using Linux mint 17 - same as Ubuntu, installed gsl via the apt-get method)

ozagordi commented 8 years ago

Hi, I would need some more info. Could you make a toy example?

fifthguy commented 7 years ago

Hi,

I just tried with more suitable data ... (Higher coverage and so on...) I was still getting the same error, then I shortened the window size and restricted the region of interest (ROI) and it worked out well ... I suppose the uncovered parts were problematic. Whenever a region with coverage of about 10 (maybe somewhat higher - it's an over the thumb estimate) or less is present in the ROI, shorah will throw me a segfault and later the error that "capemaster" shows.

Do you have it written down somewhere what min coverage of a region shorah will accept?

Thanks and best regards, Tomaž

ozagordi commented 7 years ago

Hi. No, it's not written anywhere because it's hard to tell. Shorah was developed having in mind coverage of thousands or more, the initial goal was to detect variants down to 0.1%, over short regions. With coverage of 100 it should still work fine, but it would have unexpected behaviour if coverage goes up and down wildly. Further, one should wonder why the coverage behaves like that.

Keep also in mind that global hapotype reconstruction is hard, it makes little sense if the coverage is so low, and other tools are better suited (Quasirecomb, Haplotyper, PredictHaplo). Give also a look at this paper.

fifthguy commented 7 years ago

Hi again,

the overall coverage is much higher (between 200 and 2000), the sequence however is rather long (~190kb) and contains repeats; the regions with lower coverage are identical sequential repeats, that got piled up in region ie. 1000-1500 but not in 1500-2000 ; there is a pretty harsh overmapping one part of the repeats and an undermapping in the second part. This is why such low coverage. Thank you for the paper suggestions; will read.

Best, T

ozagordi commented 7 years ago

190 kb is way too long. Repeats don't make me feel better, also. These methods work under the assumption of uniformly spread variation, such that every region covered by a read length will display some variant.

fifthguy commented 7 years ago

Yes it it farfetched; so far the pipeline is working on a repeat-masked version... Most likely I will see a certain degree of nonsense soon enough... :)

Also, based on the vcf I expect to see 2 to 16 major variants. I want to see what shorah will produce and if it makes no sense whatsoever, i will use an alternative strategy...