adamewing / tldr

Identify and annotate TE-mediated insertions in long-read sequence data
MIT License
40 stars 4 forks source link

Python: start out of range #8

Closed WeijiaSu closed 3 years ago

WeijiaSu commented 3 years ago

Hi @adamewing I was trying to run TLDR with the drosophila genome and a specific TE library. I got this error, please see below the python error message. The same command line ran successfully with another dataset, so I assume this error caused by a particular data point in this case?
Thanks for your help. Weijia

############### multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/anaconda3/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, **kwds)) File "/anaconda3/lib/python3.7/site-packages/tldr-0.1-py3.7.egg/EGG-INFO/scripts/tldr", line 1340, in process_cluster File "pysam/libcalignmentfile.pyx", line 1082, in pysam.libcalignmentfile.AlignmentFile.fetch File "pysam/libchtslib.pyx", line 690, in pysam.libchtslib.HTSFile.parse_region ValueError: start out of range (-274) """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/anaconda3/bin/tldr", line 4, in import('pkg_resources').run_script('tldr==0.1', 'tldr') File "/anaconda3/lib/python3.7/site-packages/pkg_resources/init.py", line 650, in run_script self.require(requires)[0].run_script(script_name, ns) File "/anaconda3/lib/python3.7/site-packages/pkg_resources/init.py", line 1453, in run_script exec(script_code, namespace, namespace) File "/anaconda3/lib/python3.7/site-packages/tldr-0.1-py3.7.egg/EGG-INFO/scripts/tldr", line 1805, in File "/anaconda3/lib/python3.7/site-packages/tldr-0.1-py3.7.egg/EGG-INFO/scripts/tldr", line 1645, in main File "/anaconda3/lib/python3.7/multiprocessing/pool.py", line 657, in get raise self._value ValueError: start out of range (-274)

adamewing commented 3 years ago

Thanks for the report, I've added some bounds checking to tabix fetch() calls in db6538c, pull and install the latest and let me know if this helps.

WeijiaSu commented 3 years ago

Hi @adamewing I have tried the latest version. The previous error didn't show again. But I got something weird: 1). I got a lot more output compared with the previously generated result with the same TE ref and the same reads data. Output line increased from 357 to 2248. 2). The program stopped writing any new result in the last chromosome. It ran more than 10 hours on the last chromosome while not writing anything in the output, other chromosomes can be finished within 1 hour. Thanks Weijia

adamewing commented 3 years ago

Sounds like progress, maybe. The increase in output might be due to the out-of-range bugfix. Are you running multi-threaded (-p)? Which chromosome is it?