Open XU-Nuo opened 3 years ago
I'm running into the same error, also on v1.99.2, but even in shorah amplicon
mode with default arguments:
$ shorah amplicon -b SEARCH-40821__D102066b__I02__210723_A00953_0359_BHGY7MDSX2__003.trimmed.sorted.bam -f NC_045512.2.fas
Traceback (most recent call last):
File "/usr/local/bin/shorah", line 14, in <module>
main()
File "/usr/local/lib/python3.6/site-packages/shorah/cli.py", line 210, in main
args.func(args)
File "/usr/local/lib/python3.6/site-packages/shorah/cli.py", line 89, in amplicon_run
amplicon.main(args)
File "/usr/local/lib/python3.6/site-packages/shorah/amplicon.py", line 387, in main
h = list(open('coverage.txt'))[0]
IndexError: list index out of range
So I'm not sure that this error is unique to the -d
option, but rather, it seems general to shorah amplicon
. Here's the contents of my log file:
INFO 2021/08/10 14:19:35 cli.py: main() 204: /usr/local/bin/shorah amplicon -b ../SEARCH-40821__D102066b__I02__210723_A00953_0359_BHGY7MDSX2__003.trimmed.sorted.bam -f ../NC_045512.2.fas
INFO 2021/08/10 14:19:35 cli.py: main() 205: shorah version:1.99.2
INFO 2021/08/10 14:19:35 amplicon.py: main() 352: /usr/local/bin/shorah amplicon -b ../SEARCH-40821__D102066b__I02__210723_A00953_0359_BHGY7MDSX2__003.trimmed.sorted.bam -f ../NC_045512.2.fas
INFO 2021/08/10 14:19:35 amplicon.py: main() 372: analysing region from 1 to 29903
DEBUG 2021/08/10 14:19:35 amplicon.py: run_child() 90: /usr/local/bin/b2w -i 0 -w 29903 -m 28407 -x 10000 -c 0 ../SEARCH-40821__D102066b__I02__210723_A00953_0359_BHGY7MDSX2__003.trimmed.sorted.bam ../NC_045512.2.fas
DEBUG 2021/08/10 14:19:40 amplicon.py: run_child() 99: Child /usr/local/bin/b2w returned 0
DEBUG 2021/08/10 14:19:40 amplicon.py: main() 380: b2w returned 0
In addition to the log file, shorah amplicon
created the following files, all of which are 0 bytes:
coverage.txt
reads.fas
w-NC_045512.2-1-29903.reads.fas
The edits proposed by XU-Nuo seem to fix the issue for -d
mode on my end as well, but not in the default parameters
Thank you for reporting. I hope I'll have time to check this next week...
Same issue here. I wonder if the problem might be the arguments sent to b2w.
Lines 377-378 in amplicon.py:
b2w_args = ' -i 0 -w %d -m %d -x %d -c %i%s %s %s %s' % (ref_length, int(
min_overlap * ref_length), max_coverage, cov_thrd, d, in_bam, in_fasta, region)
I'm not sure I understand what b2w does exactly, but this parameter combination of window length equal to reference length and minimum overlap equal to 95% of that always creates empty output files (coverage.txt
and the *reads.fas
files) for me.
Shorah Version:
1.99.2
Command to reproduce:
Expected Behavior:
The analysis outputs the
entropy.pdf
andentropy.csv
besides everything else when running without the-d
option.Actual Behavior:
The
entropy.pdf
andentropy.csv
are generated, but the software terminates with error:Output from
shorah.log
I have traced back in the code and it turns out that the region index return by function
highest_entropy()
insrc/shorah/amplicon.py
does not match my expectation. When all my reads have the same length, it will skip the steps finding the max entropy (https://github.com/cbg-ethz/shorah/blob/0334e4d509dc93152945a80d5b1a6dbd4af07e61/src/shorah/amplicon.py#L300 to line 304) and returns -1 for the start index of region. While from my understanding towards the code, the finding entropy part should at least be iterated one time. I found that's because the value ofrsto
is lower thatrsta
in this case. So I tried to fix this issue by changing line 298 fromto
Also, in line 305, the
high_ent_stop
should ends where the region ends as the return value to match the the logic invoked the function, so I subtract 1 fromhigh_ent_start + trimmed_mean
, fromto
By applying these two changes, I am able to run the amplicon function on my data with equal length with the
-d
option, as the region is now fixed for this rare margin case. I don't know if I understand this part right and make the correct modification. Would you verify it for me to see if this is a bug?