adamewing / tldr

Identify and annotate TE-mediated insertions in long-read sequence data
MIT License
40 stars 4 forks source link

struct.error: 'i' format requires -2147483648 <= number <= 2147483647 #29

Open adcosta17 opened 1 year ago

adcosta17 commented 1 year ago

Hi,

I'm running tldr on a high coverage ONT dataset. I'm running with min reads set to 1, looking for insertions supported by a single spanning read. I've split my runs into 20kb regions, the one here was on chr22. The command I'm running is of the format below.

~/tldr//tldr/tldr -b ~/test.bam -r ~/GRCh38.fna -c chr22.txt -e ~/tldr//ref/teref.ont.human.fa -p 10 -m 1 -o test

When I run this I get the following output and error:

2022-10-13 23:11:18,205 output basename: test 2022-10-13 23:21:00,627 writing clusters to test/chr22.pickle 2022-10-13 23:21:16,223 loaded 36 clusters from test/chr22.pickle Traceback (most recent call last): File "/u/adcosta/tldr//tldr/tldr", line 2132, in main(args) File "/u/adcosta/tldr//tldr/tldr", line 1911, in main processed_clusters.append(res.get()) File "/u/adcosta/miniconda3/lib/python3.7/multiprocessing/pool.py", line 657, in get raise self._value File "/u/adcosta/miniconda3/lib/python3.7/multiprocessing/pool.py", line 431, in _handle_tasks put(task) File "/u/adcosta/miniconda3/lib/python3.7/multiprocessing/connection.py", line 206, in send self._send_bytes(_ForkingPickler.dumps(obj)) File "/u/adcosta/miniconda3/lib/python3.7/multiprocessing/connection.py", line 393, in _send_bytes header = struct.pack("!i", n) struct.error: 'i' format requires -2147483648 <= number <= 2147483647

I'm not sure if this error is due to the overall depth of the region, its very high, or if I'm not specifying something when running. The BAM is too large for me to attach, nor do I know which cluster is causing the issue. Note I am not getting the warning message about a high depth cluster here nor do I get this error on any of the other 20kb regions I am looking at for chr22 or any other chromosome

Would appreciate any help or feedback you can provide.

Thanks, Alister

adamewing commented 1 year ago

Yea looks like the limit for a signed int32 (2^31-1) so you're probably hitting the limit of what is pickle-able in python ... all I can think of is maybe splitting the .bam up into smaller chunks?