PengNi / deepsignal-plant

Detecting methylation using signal-level features from Nanopore sequencing reads of plants
GNU General Public License v3.0
57 stars 12 forks source link

"struct.error" during running call_mods #17

Open Musketeer-D opened 2 years ago

Musketeer-D commented 2 years ago

Hi @PengNi ,

When I perform analysis using deepsignal_plant, "call_mods" finished calculation with error report like this:

` cat deepsignal_plant-call-D10.sh.log

===============================================

parameters:

input_path: /media/kkk/task23-21/data model_path: /data1/ttttt1/model/model.dp2.CNN.arabnrice2-1_120m_R9.4plus_tem.bn13_sn16.both_bilstm.epoch6.ckpt model_type: both_bilstm seq_len: 13 signal_len: 16 layernum1: 3 layernum2: 1 class_num: 2 dropout_rate: 0 n_vocab: 16 n_embed: 4 is_base: yes is_signallen: yes batch_size: 512 hid_rnn: 256 result_file: /media/kkk/WD-D10/task23/ttttt-fast5s.C.call_mods-211025.tsv recursively: yes corrected_group: RawGenomeCorrected_000 basecall_subgroup: BaseCalled_template reference_path: /data1/ttttt1/genome/ttttt.ttttt.fa is_dna: yes normalize_method: mad methy_label: 1 motifs: C mod_loc: 0 f5_batch_size: 10 region: None positions: None nproc: 30 nproc_gpu: 6

===============================================

[main] call_mods starts.. 1611xxxx fast5 files in total.. parse the motifs string.. read genome reference file.. read position file: None parse region of interest: None, [None, None) read_fast5 process-8477 starts read_fast5 process-8477 ending, proceed 695630 fast5s Traceback (most recent call last): File "/root/anaconda3/envs/deepsignalpenv/lib/python3.6/multiprocessing/queues.py", line 240, in _feed send_bytes(obj) File "/root/anaconda3/envs/deepsignalpenv/lib/python3.6/multiprocessing/connection.py", line 200, in send_bytes self._send_bytes(m[offset:offset + size]) File "/root/anaconda3/envs/deepsignalpenv/lib/python3.6/multiprocessing/connection.py", line 393, in _send_bytes header = struct.pack("!i", n) struct.error: 'i' format requires -2147483648 <= number <= 2147483647 read_fast5 process-9184 starts read_fast5 process-9184 ending, proceed 697760 fast5s Traceback (most recent call last): File "/root/anaconda3/envs/deepsignalpenv/lib/python3.6/multiprocessing/queues.py", line 240, in _feed send_bytes(obj) File "/root/anaconda3/envs/deepsignalpenv/lib/python3.6/multiprocessing/connection.py", line 200, in send_bytes self._send_bytes(m[offset:offset + size]) File "/root/anaconda3/envs/deepsignalpenv/lib/python3.6/multiprocessing/connection.py", line 393, in _send_bytes header = struct.pack("!i", n) struct.error: 'i' format requires -2147483648 <= number <= 2147483647 read_fast5 process-8399 starts read_fast5 process-8399 ending, proceed 696380 fast5s Traceback (most recent call last): File "/root/anaconda3/envs/deepsignalpenv/lib/python3.6/multiprocessing/queues.py", line 240, in _feed send_bytes(obj) File "/root/anaconda3/envs/deepsignalpenv/lib/python3.6/multiprocessing/connection.py", line 200, in send_bytes self._send_bytes(m[offset:offset + size]) File "/root/anaconda3/envs/deepsignalpenv/lib/python3.6/multiprocessing/connection.py", line 393, in _send_bytes header = struct.pack("!i", n) struct.error: 'i' format requires -2147483648 <= number <= 2147483647 Traceback (most recent call last): File "/root/anaconda3/envs/deepsignalpenv/lib/python3.6/multiprocessing/queues.py", line 240, in _feed send_bytes(obj) File "/root/anaconda3/envs/deepsignalpenv/lib/python3.6/multiprocessing/connection.py", line 200, in send_bytes self._send_bytes(m[offset:offset + size]) File "/root/anaconda3/envs/deepsignalpenv/lib/python3.6/multiprocessing/connection.py", line 393, in _send_bytes header = struct.pack("!i", n) struct.error: 'i' format requires -2147483648 <= number <= 2147483647 read_fast5 process-9513 starts read_fast5 process-9513 ending, proceed 694210 fast5s Traceback (most recent call last): File "/root/anaconda3/envs/deepsignalpenv/lib/python3.6/multiprocessing/queues.py", line 240, in _feed send_bytes(obj) File "/root/anaconda3/envs/deepsignalpenv/lib/python3.6/multiprocessing/connection.py", line 200, in send_bytes self._send_bytes(m[offset:offset + size]) File "/root/anaconda3/envs/deepsignalpenv/lib/python3.6/multiprocessing/connection.py", line 393, in _send_bytes header = struct.pack("!i", n) struct.error: 'i' format requires -2147483648 <= number <= 2147483647 read_fast5 process-9647 starts read_fast5 process-9647 ending, proceed 703810 fast5s read_fast5 process-9387 starts read_fast5 process-9387 ending, proceed 705630 fast5s read_fast5 process-9257 starts read_fast5 process-9257 ending, proceed 701250 fast5s read_fast5 process-8725 starts read_fast5 process-8725 ending, proceed 697250 fast5s read_fast5 process-9384 starts read_fast5 process-9384 ending, proceed 700780 fast5s read_fast5 process-8464 starts read_fast5 process-8464 ending, proceed 692360 fast5s read_fast5 process-8851 starts read_fast5 process-8851 ending, proceed 698130 fast5s read_fast5 process-8875 starts read_fast5 process-8875 ending, proceed 706270 fast5s read_fast5 process-8857 starts read_fast5 process-8857 ending, proceed 702850 fast5s read_fast5 process-8724 starts read_fast5 process-8724 ending, proceed 704230 fast5s Traceback (most recent call last): File "/root/anaconda3/envs/deepsignalpenv/lib/python3.6/multiprocessing/queues.py", line 240, in _feed send_bytes(obj) File "/root/anaconda3/envs/deepsignalpenv/lib/python3.6/multiprocessing/connection.py", line 200, in send_bytes self._send_bytes(m[offset:offset + size]) File "/root/anaconda3/envs/deepsignalpenv/lib/python3.6/multiprocessing/connection.py", line 393, in _send_bytes header = struct.pack("!i", n) struct.error: 'i' format requires -2147483648 <= number <= 2147483647 read_fast5 process-8658 starts read_fast5 process-8658 ending, proceed 693770 fast5s Traceback (most recent call last): File "/root/anaconda3/envs/deepsignalpenv/lib/python3.6/multiprocessing/queues.py", line 240, in _feed send_bytes(obj) File "/root/anaconda3/envs/deepsignalpenv/lib/python3.6/multiprocessing/connection.py", line 200, in send_bytes self._send_bytes(m[offset:offset + size]) File "/root/anaconda3/envs/deepsignalpenv/lib/python3.6/multiprocessing/connection.py", line 393, in _send_bytes header = struct.pack("!i", n) struct.error: 'i' format requires -2147483648 <= number <= 2147483647 read_fast5 process-9256 starts read_fast5 process-9256 ending, proceed 702630 fast5s Traceback (most recent call last): File "/root/anaconda3/envs/deepsignalpenv/lib/python3.6/multiprocessing/queues.py", line 240, in _feed send_bytes(obj) File "/root/anaconda3/envs/deepsignalpenv/lib/python3.6/multiprocessing/connection.py", line 200, in send_bytes self._send_bytes(m[offset:offset + size]) File "/root/anaconda3/envs/deepsignalpenv/lib/python3.6/multiprocessing/connection.py", line 393, in _send_bytes header = struct.pack("!i", n) struct.error: 'i' format requires -2147483648 <= number <= 2147483647 read_fast5 process-8594 starts read_fast5 process-8594 ending, proceed 696206 fast5s read_fast5 process-9115 starts read_fast5 process-9115 ending, proceed 697340 fast5s Traceback (most recent call last): File "/root/anaconda3/envs/deepsignalpenv/lib/python3.6/multiprocessing/queues.py", line 240, in _feed send_bytes(obj) File "/root/anaconda3/envs/deepsignalpenv/lib/python3.6/multiprocessing/connection.py", line 200, in send_bytes self._send_bytes(m[offset:offset + size]) File "/root/anaconda3/envs/deepsignalpenv/lib/python3.6/multiprocessing/connection.py", line 393, in _send_bytes header = struct.pack("!i", n) struct.error: 'i' format requires -2147483648 <= number <= 2147483647 read_fast5 process-8398 starts read_fast5 process-8398 ending, proceed 705710 fast5s read_fast5 process-8400 starts read_fast5 process-8400 ending, proceed 706910 fast5s read_fast5 process-8530 starts read_fast5 process-8530 ending, proceed 708990 fast5s Traceback (most recent call last): File "/root/anaconda3/envs/deepsignalpenv/lib/python3.6/multiprocessing/queues.py", line 240, in _feed send_bytes(obj) File "/root/anaconda3/envs/deepsignalpenv/lib/python3.6/multiprocessing/connection.py", line 200, in send_bytes self._send_bytes(m[offset:offset + size]) File "/root/anaconda3/envs/deepsignalpenv/lib/python3.6/multiprocessing/connection.py", line 393, in _send_bytes header = struct.pack("!i", n) struct.error: 'i' format requires -2147483648 <= number <= 2147483647 Traceback (most recent call last): File "/root/anaconda3/envs/deepsignalpenv/lib/python3.6/multiprocessing/queues.py", line 240, in _feed send_bytes(obj) File "/root/anaconda3/envs/deepsignalpenv/lib/python3.6/multiprocessing/connection.py", line 200, in send_bytes self._send_bytes(m[offset:offset + size]) File "/root/anaconda3/envs/deepsignalpenv/lib/python3.6/multiprocessing/connection.py", line 393, in _send_bytes header = struct.pack("!i", n) struct.error: 'i' format requires -2147483648 <= number <= 2147483647 read_fast5 process-9118 starts read_fast5 process-9118 ending, proceed 706140 fast5s read_fast5 process-8401 starts read_fast5 process-8401 ending, proceed 701090 fast5s call_mods process-9784 starts call_mods process-9784 ending, proceed 37851110 batches call_mods process-9848 starts call_mods process-9848 ending, proceed 37980116 batches call_mods process-9778 starts call_mods process-9778 ending, proceed 37910389 batches call_mods process-9713 starts call_mods process-9713 ending, proceed 37911827 batches call_mods process-10003 starts call_mods process-10003 ending, proceed 37924245 batches call_mods process-9914 starts call_mods process-9914 ending, proceed 37894543 batches write_process-10173 starts write_process-10173 finished 105xxxx of 1611xxxx fast5 files failed.. [main] call_mods costs xxx seconds.. `

As "call_mods" give me error report like this, can you tell me whether the finished result "/media/kkk/WD-D10/task23/ttttt-fast5s.C.call_mods-211025.tsv" still be reliable? Can I using "/media/kkk/WD-D10/task23/ttttt-fast5s.C.call_mods-211025.tsv" to perfrom following analysis ?

Thank you for your kind help !

PengNi commented 2 years ago

Hi @Musketeer-D , thanks for using our tool.

  1. This "struct.error" is something I did't meet before. It seems sizes of some objects between multiprocessing.Queue()/subprocess connection exceed the limit of int in Python. Likely it is the Queue() which saves the filenames of fast5s. Because deepsignal-plant loads all the filenames at once to a Queue() at the beginning, maybe this action cannot handle tens of millions of fast5s. -- I'll try to change the code of deepsignal-plant in the next days. -- Also, it seems the struct.error/i size limitation has been fixed in python 3.8. If you had to re-run "call_mods", you can also upgrade your python in your environment first, without change the code of deepsignal-plant. (ref: ref1, ref2)

  2. I am not sure if the result "ttttt-fast5s.C.call_mods-211025.tsv" is reliable or not. This issue doesn't affect the prediction of "call_mods" module. But some fast5s maybe not processed by "call_mods". I suggest that you count the number of unique readids (column 5 in call_mods.tsv), see if the number is equal to 1611xxxx - 105xxxx, then you can decide whether to re-run "call_mods" or not.

Best, Peng

Musketeer-D commented 2 years ago

Thanks again for your awesome software and great research paper!