adamewing / tldr

Identify and annotate TE-mediated insertions in long-read sequence data
MIT License
40 stars 4 forks source link

tldr deal with longer chromosomes #18

Open shangshanzhizhe opened 2 years ago

shangshanzhizhe commented 2 years ago

Hi,

Thanks for the useful tool, I encountered a little issue in my work. TDLR works well when at the first 20 min, while after that some of processes turned to SLEEP, who don't occupy any CPUs, then the program went silent. Here's the TOP screenshot. Do you have any suggestions?

Best, Shangzhe

image

adamewing commented 2 years ago

Hmm, tricky to know how to troubleshoot this kind of thing. Maybe try turning on --debug if you haven't. Would you be able to tell me the command-line arguments you're using and some information about the genome (you say 'longer chromosomes' - is this an organism with a particularly large genome?).

shangshanzhizhe commented 2 years ago

Hi, Thanks for your reply. I tried the --debug option as flows:

time tldr --debug -b 00.bam.files/AL-1.sorted.bam -r 00.reference/Fsr.LG.fasta -e none -p 15 2>&1 | tee > tldr.log

Here's logs in tldr.log:

2021-09-09 09:32:14,373 tldr started with command: /data/00/user/user186/miniconda3/envs/tldr/bin/tldr --debug -b 00.bam.files/AL-1.sorted.bam ->
2021-09-09 09:32:14,373 output basename: AL-1.sorted
2021-09-09 09:32:14,427 "None" passed to -e/--elts, running without TE reference
2021-09-09 09:32:14,629 skip read warning: 5aff4388-b6d8-43fd-8316-66f0c449bed6
**Many repeats**
2021-09-09 09:32:15,175 r_pos overlaps multiple clusters on read: 498f68b3-585b-4714-a392-a2e7c77343e9
**Casual repeats**
2021-09-09 09:45:27,990 writing clusters to AL-1.sorted/LG14.pickle
**Many repeats**
**Casual repeats**
2021-09-09 09:45:57,071 writing clusters to AL-1.sorted/LG12.pickle
2021-09-09 09:46:51,068 writing clusters to AL-1.sorted/LG08.pickle
2021-09-09 09:47:03,046 writing clusters to AL-1.sorted/LG10.pickle
2021-09-09 09:47:43,402 writing clusters to AL-1.sorted/LG15.pickle
....
2021-09-09 10:05:44,177 writing clusters to AL-1.sorted/LG31.pickle

Then it went silent and above mentioned 65 pickles were written to AL-1.sorted folder. While nothing happened to the remaining 22 contigs in my genome, including the longest one, LG01 (~ 130Mb).

I have no ideally on this entirely. My system is Linux x64. The disks RAID were mapped to a port of nodes in cluster, does the I/O speed matters?