Open shunhuahan opened 3 years ago
Does this error cause tldr to crash or does it continue? If it's the latter, does it throw a lot of these?
--color_consensus
shouldn't matter as that's well past where this error would occur.
Thanks for the reply! tldr didn't crash and exit, but the job got stuck and I don't see any more messages being produced after Index Error on consensus adjustment
.
Index Error
message first appeared so I'm pretty sure the tldr job is hung. I'm happy to provide those intermediate files or raw data if you think these can help the debugging process. Thanks a lot!
17G -rw-r--r--. 1 sh60271 cmblab 17G Apr 23 11:32 iso1_pb_tldr_2.bam
6.6M -rw-r--r--. 1 sh60271 cmblab 6.6M Apr 23 11:34 iso1_pb_tldr_2.bam.bai
0 -rw-r--r--. 1 sh60271 cmblab 0 Apr 23 11:34 iso1_pb_tldr_2.table.txt
8.0K drwxr-xr-x. 2 sh60271 cmblab 4.0K Apr 23 11:42 iso1_pb_tldr_2
1.2M -rw-r--r--. 1 sh60271 cmblab 1.1M Apr 23 11:42 None.3cacb881-c0a2-42da-b9e3-fc8387ae198b.cluster.fq
0 -rw-r--r--. 1 sh60271 cmblab 0 Apr 23 11:42 3cacb881-c0a2-42da-b9e3-fc8387ae198b.cluster.msa.fa
568K -rw-r--r--. 1 sh60271 cmblab 568K Apr 23 11:42 3cacb881-c0a2-42da-b9e3-fc8387ae198b.cluster.fa
896K -rw-r--r--. 1 sh60271 cmblab 895K Apr 23 11:46 None.91e4f2ae-9899-4f9c-8c35-5dc54336497a.cluster.fq
456K -rw-r--r--. 1 sh60271 cmblab 454K Apr 23 11:46 91e4f2ae-9899-4f9c-8c35-5dc54336497a.cluster.fa
0 -rw-r--r--. 1 sh60271 cmblab 0 Apr 23 11:46 91e4f2ae-9899-4f9c-8c35-5dc54336497a.cluster.msa.fa
4.0K -rw-r--r--. 1 sh60271 cmblab 1.9K Apr 23 11:47 tmp.3f1c34e4-c4b0-4995-a7ff-5bc18043bd58.tgt.fa
412K -rw-r--r--. 1 sh60271 cmblab 409K Apr 23 11:47 None.81c60cb9-acec-489b-81e4-276e259bd914.cluster.fq
0 -rw-r--r--. 1 sh60271 cmblab 0 Apr 23 11:47 81c60cb9-acec-489b-81e4-276e259bd914.cluster.msa.fa
208K -rw-r--r--. 1 sh60271 cmblab 207K Apr 23 11:47 81c60cb9-acec-489b-81e4-276e259bd914.cluster.fa
508K -rw-r--r--. 1 sh60271 cmblab 507K Apr 23 11:47 None.bae6d654-84c9-499f-899e-1f0f1790fe0a.cluster.fq
0 -rw-r--r--. 1 sh60271 cmblab 0 Apr 23 11:47 bae6d654-84c9-499f-899e-1f0f1790fe0a.cluster.msa.fa
260K -rw-r--r--. 1 sh60271 cmblab 259K Apr 23 11:47 bae6d654-84c9-499f-899e-1f0f1790fe0a.cluster.fa
424K -rw-r--r--. 1 sh60271 cmblab 421K Apr 23 11:47 None.5559d4cf-f591-4e99-9e30-df27f086adc8.cluster.fq
0 -rw-r--r--. 1 sh60271 cmblab 0 Apr 23 11:47 5559d4cf-f591-4e99-9e30-df27f086adc8.cluster.msa.fa
216K -rw-r--r--. 1 sh60271 cmblab 213K Apr 23 11:47 5559d4cf-f591-4e99-9e30-df27f086adc8.cluster.fa
4.0K -rw-r--r--. 1 sh60271 cmblab 1.3K Apr 23 11:48 tmp.d01eb1f6-9606-4207-82b2-f55087831dce.tgt.fa
4.0K -rw-r--r--. 1 sh60271 cmblab 927 Apr 23 11:48 tmp.63aa235f-8496-4c03-86b6-03297c6764d0.tgt.fa
Thanks. That index error is just a warning message and tldr should keep going. If you turn on --debug
do you get output after the error? Can you verify that cpu usage drops off after the error?
I re-submitted the job with --debug
turned on. The index error happened again, this time followed by large number of standard output messages. The log file stopped updating 24h ago. Here are the last few lines.
2021-04-29 20:33:42,364 cluster 02b6920a-f060-48b5-a4f1-a6e22930da89, alignment group:
TE McClintock:LTR 497 1085 1307 1523 1723 82.91 + +
TE McClintock:LTR 492 469 629 940 1091 90.67 + +
TE McClintock:LTR 259 736 813 1208 1284 98.61 + +
2021-04-29 20:33:48,594 joining multiple segments for cluster 832f8cf7-f089-440c-a3b2-151a7698a30b
2021-04-29 20:33:49,114 breakpoint adjustment for 832f8cf7-f089-440c-a3b2-151a7698a30b
2021-04-29 20:33:56,354 breakpoint adjustment for e8abe882-8ee8-467d-908a-77dbd16f974c
2021-04-29 20:34:02,924 breakpoint adjustment for 6dd7e520-d391-42ee-814e-f644597262ad
2021-04-29 20:34:35,612 breakpoint adjustment for 775549cf-c6e9-47b7-999a-6df1e32cdd70
2021-04-29 20:35:00,352 breakpoint adjustment for 82bfcc9e-3f05-4b75-a402-56755d3adbf6
2021-04-29 20:35:27,132 breakpoint adjustment for ecbfa54d-09a0-44ad-b758-981b4f38df67
2021-04-29 20:36:18,853 joining multiple segments for cluster 02b6920a-f060-48b5-a4f1-a6e22930da89
2021-04-29 20:36:18,918 breakpoint adjustment for 02b6920a-f060-48b5-a4f1-a6e22930da89
I can also verify that the CPU usage is now 0%.
(s2rp) sh60271@c1-8 s2rplus$ seff 2289846
Job ID: 2289846
Cluster: tc2
User/Group: sh60271/cmblab
State: RUNNING
Nodes: 1
Cores per node: 28
CPU Utilized: 00:00:00
CPU Efficiency: 0.00% of 30-04:50:32 core-walltime
Job Wall-clock time: 1-01:53:14
Memory Utilized: 0.00 MB (estimated maximum)
Memory Efficiency: 0.00% of 50.00 GB (50.00 GB/node)
WARNING: Efficiency statistics may be misleading for RUNNING jobs.
Here are files includes in the output directory with timestamp. The final output table is empty.
17G -rw-r--r-- 1 sh60271 cmblab 17G Apr 29 20:08 iso1_pb_tldr_3.bam
6.6M -rw-r--r-- 1 sh60271 cmblab 6.6M Apr 29 20:10 iso1_pb_tldr_3.bam.bai
0 -rw-r--r-- 1 sh60271 cmblab 0 Apr 29 20:10 iso1_pb_tldr_3.table.txt
8.0K drwxr-xr-x 2 sh60271 cmblab 4.0K Apr 29 20:18 iso1_pb_tldr_3
1.2M -rw-r--r-- 1 sh60271 cmblab 1.1M Apr 29 20:18 None.d831f53a-cf3c-41c5-9129-8006b9805ae0.cluster.fq
568K -rw-r--r-- 1 sh60271 cmblab 568K Apr 29 20:18 d831f53a-cf3c-41c5-9129-8006b9805ae0.cluster.fa
0 -rw-r--r-- 1 sh60271 cmblab 0 Apr 29 20:18 d831f53a-cf3c-41c5-9129-8006b9805ae0.cluster.msa.fa
800K -rw-r--r-- 1 sh60271 cmblab 797K Apr 29 20:22 None.3f480193-5e23-4b8a-a5b8-cf887b9651e9.cluster.fq
0 -rw-r--r-- 1 sh60271 cmblab 0 Apr 29 20:22 3f480193-5e23-4b8a-a5b8-cf887b9651e9.cluster.msa.fa
404K -rw-r--r-- 1 sh60271 cmblab 403K Apr 29 20:22 3f480193-5e23-4b8a-a5b8-cf887b9651e9.cluster.fa
560K -rw-r--r-- 1 sh60271 cmblab 557K Apr 29 20:22 None.a82f6fd2-98cc-4596-a015-c48753f356c2.cluster.fq
0 -rw-r--r-- 1 sh60271 cmblab 0 Apr 29 20:22 a82f6fd2-98cc-4596-a015-c48753f356c2.cluster.msa.fa
284K -rw-r--r-- 1 sh60271 cmblab 284K Apr 29 20:22 a82f6fd2-98cc-4596-a015-c48753f356c2.cluster.fa
4.0K -rw-r--r-- 1 sh60271 cmblab 1.9K Apr 29 20:23 tmp.eb4cf706-4205-4517-b805-1d1a0d8bdc94.tgt.fa
448K -rw-r--r-- 1 sh60271 cmblab 446K Apr 29 20:23 None.cf099a9d-8dad-43ae-8aa9-600d5e6ed7a8.cluster.fq
0 -rw-r--r-- 1 sh60271 cmblab 0 Apr 29 20:23 cf099a9d-8dad-43ae-8aa9-600d5e6ed7a8.cluster.msa.fa
228K -rw-r--r-- 1 sh60271 cmblab 226K Apr 29 20:23 cf099a9d-8dad-43ae-8aa9-600d5e6ed7a8.cluster.fa
600K -rw-r--r-- 1 sh60271 cmblab 597K Apr 29 20:24 None.e59e022e-fbf8-455d-863c-bc5a72d64a63.cluster.fq
0 -rw-r--r-- 1 sh60271 cmblab 0 Apr 29 20:24 e59e022e-fbf8-455d-863c-bc5a72d64a63.cluster.msa.fa
304K -rw-r--r-- 1 sh60271 cmblab 302K Apr 29 20:24 e59e022e-fbf8-455d-863c-bc5a72d64a63.cluster.fa
4.0K -rw-r--r-- 1 sh60271 cmblab 4.0K Apr 29 20:24 tmp.52e2a485-a28f-427d-affc-752112b69412.tgt.fa
4.0K -rw-r--r-- 1 sh60271 cmblab 1.8K Apr 29 20:24 tmp.4a101e39-7b1a-4f91-a442-0bfc0eadaca8.tgt.fa
I have not cancelled the hung job yet. Let me know if you need more infos. Thanks for the help!
This happened to me after a few of those errors but then resumed after a little under 10min. Python was at 0% but disttbfast (I believe that's MAFFT) was running at 100% of one core during that time. Maybe MAFFT has trouble exiting sometimes? https://bugs.launchpad.net/ubuntu/+source/mafft/+bug/1897559
Hi,
--color_consensus
option?Thanks, Shunhua