Index Error on consensus adjustment

shunhuahan commented 3 years ago

Hi,

I'm testing the latest TLDR build in the github repo on some nanopore and Pacbio RS II dataset for drosophila genome. TLDR runs fine on nanopore data but the Pacbio run got stuck with following error messages. The error is reproducible under multiple replication runs. Could you help me figure out what's causing the issue and how to resolve it?

2021-04-23 11:34:46,160 writing clusters to iso1_pb_tldr_2/chr4.pickle
2021-04-23 11:39:56,580 writing clusters to iso1_pb_tldr_2/chrX.pickle
2021-04-23 11:40:29,394 writing clusters to iso1_pb_tldr_2/chr2L.pickle
2021-04-23 11:41:04,486 writing clusters to iso1_pb_tldr_2/chr3R.pickle
2021-04-23 11:41:35,940 writing clusters to iso1_pb_tldr_2/chr3L.pickle
2021-04-23 11:42:04,119 writing clusters to iso1_pb_tldr_2/chr2R.pickle
2021-04-23 11:42:13,195 loaded 28904 clusters from iso1_pb_tldr_2/chr2L.pickle
2021-04-23 11:53:20,448 Index Error on consensus adjustment: 949bedb9-397b-47c0-91ea-9ee5390f7805 chr2L:20484075-20484092

Here is how I ran TLDR on the Pacbio dataset. Would it help if I turn off the --color_consensus option?
```
tldr -b $bam -e $te_library -r $ref --color_consensus -p $threads
```

Thanks, Shunhua

adamewing commented 3 years ago

Does this error cause tldr to crash or does it continue? If it's the latter, does it throw a lot of these?

--color_consensus shouldn't matter as that's well past where this error would occur.

shunhuahan commented 3 years ago

Thanks for the reply! tldr didn't crash and exit, but the job got stuck and I don't see any more messages being produced after Index Error on consensus adjustment.

shunhuahan commented 3 years ago

To provide more detailed information. Here are the intermediate files with timestamp. Actually I don't see any intermediate files being generated after the Index Error message first appeared so I'm pretty sure the tldr job is hung. I'm happy to provide those intermediate files or raw data if you think these can help the debugging process. Thanks a lot!

17G -rw-r--r--. 1 sh60271 cmblab  17G Apr 23 11:32 iso1_pb_tldr_2.bam
6.6M -rw-r--r--. 1 sh60271 cmblab 6.6M Apr 23 11:34 iso1_pb_tldr_2.bam.bai
0 -rw-r--r--. 1 sh60271 cmblab    0 Apr 23 11:34 iso1_pb_tldr_2.table.txt
8.0K drwxr-xr-x. 2 sh60271 cmblab 4.0K Apr 23 11:42 iso1_pb_tldr_2
1.2M -rw-r--r--. 1 sh60271 cmblab 1.1M Apr 23 11:42 None.3cacb881-c0a2-42da-b9e3-fc8387ae198b.cluster.fq
0 -rw-r--r--. 1 sh60271 cmblab    0 Apr 23 11:42 3cacb881-c0a2-42da-b9e3-fc8387ae198b.cluster.msa.fa
568K -rw-r--r--. 1 sh60271 cmblab 568K Apr 23 11:42 3cacb881-c0a2-42da-b9e3-fc8387ae198b.cluster.fa
896K -rw-r--r--. 1 sh60271 cmblab 895K Apr 23 11:46 None.91e4f2ae-9899-4f9c-8c35-5dc54336497a.cluster.fq
456K -rw-r--r--. 1 sh60271 cmblab 454K Apr 23 11:46 91e4f2ae-9899-4f9c-8c35-5dc54336497a.cluster.fa
0 -rw-r--r--. 1 sh60271 cmblab    0 Apr 23 11:46 91e4f2ae-9899-4f9c-8c35-5dc54336497a.cluster.msa.fa
4.0K -rw-r--r--. 1 sh60271 cmblab 1.9K Apr 23 11:47 tmp.3f1c34e4-c4b0-4995-a7ff-5bc18043bd58.tgt.fa
412K -rw-r--r--. 1 sh60271 cmblab 409K Apr 23 11:47 None.81c60cb9-acec-489b-81e4-276e259bd914.cluster.fq
0 -rw-r--r--. 1 sh60271 cmblab    0 Apr 23 11:47 81c60cb9-acec-489b-81e4-276e259bd914.cluster.msa.fa
208K -rw-r--r--. 1 sh60271 cmblab 207K Apr 23 11:47 81c60cb9-acec-489b-81e4-276e259bd914.cluster.fa
508K -rw-r--r--. 1 sh60271 cmblab 507K Apr 23 11:47 None.bae6d654-84c9-499f-899e-1f0f1790fe0a.cluster.fq
0 -rw-r--r--. 1 sh60271 cmblab    0 Apr 23 11:47 bae6d654-84c9-499f-899e-1f0f1790fe0a.cluster.msa.fa
260K -rw-r--r--. 1 sh60271 cmblab 259K Apr 23 11:47 bae6d654-84c9-499f-899e-1f0f1790fe0a.cluster.fa
424K -rw-r--r--. 1 sh60271 cmblab 421K Apr 23 11:47 None.5559d4cf-f591-4e99-9e30-df27f086adc8.cluster.fq
0 -rw-r--r--. 1 sh60271 cmblab    0 Apr 23 11:47 5559d4cf-f591-4e99-9e30-df27f086adc8.cluster.msa.fa
216K -rw-r--r--. 1 sh60271 cmblab 213K Apr 23 11:47 5559d4cf-f591-4e99-9e30-df27f086adc8.cluster.fa
4.0K -rw-r--r--. 1 sh60271 cmblab 1.3K Apr 23 11:48 tmp.d01eb1f6-9606-4207-82b2-f55087831dce.tgt.fa
4.0K -rw-r--r--. 1 sh60271 cmblab  927 Apr 23 11:48 tmp.63aa235f-8496-4c03-86b6-03297c6764d0.tgt.fa

adamewing commented 3 years ago

Thanks. That index error is just a warning message and tldr should keep going. If you turn on --debug do you get output after the error? Can you verify that cpu usage drops off after the error?

shunhuahan commented 3 years ago

I re-submitted the job with --debug turned on. The index error happened again, this time followed by large number of standard output messages. The log file stopped updating 24h ago. Here are the last few lines.

2021-04-29 20:33:42,364 cluster 02b6920a-f060-48b5-a4f1-a6e22930da89, alignment group:
TE      McClintock:LTR  497     1085    1307    1523    1723    82.91   +       +
TE      McClintock:LTR  492     469     629     940     1091    90.67   +       +
TE      McClintock:LTR  259     736     813     1208    1284    98.61   +       +
2021-04-29 20:33:48,594 joining multiple segments for cluster 832f8cf7-f089-440c-a3b2-151a7698a30b
2021-04-29 20:33:49,114 breakpoint adjustment for 832f8cf7-f089-440c-a3b2-151a7698a30b
2021-04-29 20:33:56,354 breakpoint adjustment for e8abe882-8ee8-467d-908a-77dbd16f974c
2021-04-29 20:34:02,924 breakpoint adjustment for 6dd7e520-d391-42ee-814e-f644597262ad
2021-04-29 20:34:35,612 breakpoint adjustment for 775549cf-c6e9-47b7-999a-6df1e32cdd70
2021-04-29 20:35:00,352 breakpoint adjustment for 82bfcc9e-3f05-4b75-a402-56755d3adbf6
2021-04-29 20:35:27,132 breakpoint adjustment for ecbfa54d-09a0-44ad-b758-981b4f38df67
2021-04-29 20:36:18,853 joining multiple segments for cluster 02b6920a-f060-48b5-a4f1-a6e22930da89
2021-04-29 20:36:18,918 breakpoint adjustment for 02b6920a-f060-48b5-a4f1-a6e22930da89

I can also verify that the CPU usage is now 0%.

(s2rp) sh60271@c1-8 s2rplus$ seff 2289846
Job ID: 2289846
Cluster: tc2
User/Group: sh60271/cmblab
State: RUNNING
Nodes: 1
Cores per node: 28
CPU Utilized: 00:00:00
CPU Efficiency: 0.00% of 30-04:50:32 core-walltime
Job Wall-clock time: 1-01:53:14
Memory Utilized: 0.00 MB (estimated maximum)
Memory Efficiency: 0.00% of 50.00 GB (50.00 GB/node)
WARNING: Efficiency statistics may be misleading for RUNNING jobs.

Here are files includes in the output directory with timestamp. The final output table is empty.

17G -rw-r--r-- 1 sh60271 cmblab  17G Apr 29 20:08 iso1_pb_tldr_3.bam
6.6M -rw-r--r-- 1 sh60271 cmblab 6.6M Apr 29 20:10 iso1_pb_tldr_3.bam.bai
0 -rw-r--r-- 1 sh60271 cmblab    0 Apr 29 20:10 iso1_pb_tldr_3.table.txt
8.0K drwxr-xr-x 2 sh60271 cmblab 4.0K Apr 29 20:18 iso1_pb_tldr_3
1.2M -rw-r--r-- 1 sh60271 cmblab 1.1M Apr 29 20:18 None.d831f53a-cf3c-41c5-9129-8006b9805ae0.cluster.fq
568K -rw-r--r-- 1 sh60271 cmblab 568K Apr 29 20:18 d831f53a-cf3c-41c5-9129-8006b9805ae0.cluster.fa
0 -rw-r--r-- 1 sh60271 cmblab    0 Apr 29 20:18 d831f53a-cf3c-41c5-9129-8006b9805ae0.cluster.msa.fa
800K -rw-r--r-- 1 sh60271 cmblab 797K Apr 29 20:22 None.3f480193-5e23-4b8a-a5b8-cf887b9651e9.cluster.fq
0 -rw-r--r-- 1 sh60271 cmblab    0 Apr 29 20:22 3f480193-5e23-4b8a-a5b8-cf887b9651e9.cluster.msa.fa
404K -rw-r--r-- 1 sh60271 cmblab 403K Apr 29 20:22 3f480193-5e23-4b8a-a5b8-cf887b9651e9.cluster.fa
560K -rw-r--r-- 1 sh60271 cmblab 557K Apr 29 20:22 None.a82f6fd2-98cc-4596-a015-c48753f356c2.cluster.fq
0 -rw-r--r-- 1 sh60271 cmblab    0 Apr 29 20:22 a82f6fd2-98cc-4596-a015-c48753f356c2.cluster.msa.fa
284K -rw-r--r-- 1 sh60271 cmblab 284K Apr 29 20:22 a82f6fd2-98cc-4596-a015-c48753f356c2.cluster.fa
4.0K -rw-r--r-- 1 sh60271 cmblab 1.9K Apr 29 20:23 tmp.eb4cf706-4205-4517-b805-1d1a0d8bdc94.tgt.fa
448K -rw-r--r-- 1 sh60271 cmblab 446K Apr 29 20:23 None.cf099a9d-8dad-43ae-8aa9-600d5e6ed7a8.cluster.fq
0 -rw-r--r-- 1 sh60271 cmblab    0 Apr 29 20:23 cf099a9d-8dad-43ae-8aa9-600d5e6ed7a8.cluster.msa.fa
228K -rw-r--r-- 1 sh60271 cmblab 226K Apr 29 20:23 cf099a9d-8dad-43ae-8aa9-600d5e6ed7a8.cluster.fa
600K -rw-r--r-- 1 sh60271 cmblab 597K Apr 29 20:24 None.e59e022e-fbf8-455d-863c-bc5a72d64a63.cluster.fq
0 -rw-r--r-- 1 sh60271 cmblab    0 Apr 29 20:24 e59e022e-fbf8-455d-863c-bc5a72d64a63.cluster.msa.fa
304K -rw-r--r-- 1 sh60271 cmblab 302K Apr 29 20:24 e59e022e-fbf8-455d-863c-bc5a72d64a63.cluster.fa
4.0K -rw-r--r-- 1 sh60271 cmblab 4.0K Apr 29 20:24 tmp.52e2a485-a28f-427d-affc-752112b69412.tgt.fa
4.0K -rw-r--r-- 1 sh60271 cmblab 1.8K Apr 29 20:24 tmp.4a101e39-7b1a-4f91-a442-0bfc0eadaca8.tgt.fa

I have not cancelled the hung job yet. Let me know if you need more infos. Thanks for the help!

itslittman commented 6 months ago

This happened to me after a few of those errors but then resumed after a little under 10min. Python was at 0% but disttbfast (I believe that's MAFFT) was running at 100% of one core during that time. Maybe MAFFT has trouble exiting sometimes? https://bugs.launchpad.net/ubuntu/+source/mafft/+bug/1897559

adamewing / tldr

Index Error on consensus adjustment #13