Time consuming process - Githubissues

Benson-Genomics-Lab / TRF

Tandem Repeats Finder: a program to analyze DNA sequences

https://tandem.bu.edu/trf/trf.html

GNU Affero General Public License v3.0

155 stars 26 forks source link

Time consuming process #7

Open karimi81 opened 3 years ago

karimi81 commented 3 years ago

Hi There, I am trying to use TRF to detect tandem repeats in a large genome assembly with the size of 2.68 Gb. Although the program worked for about 12 days on a node with 32 CPUs and 125 Gb memory, finally it was not completed. The following is the command I have used: trf new_id.fasta 2 5 7 80 10 50 2000 Is there any way to improve the efficiency of the computation? e.g parallel processing or reduce the computation time . I would be appreciated if you could help me in this regard. Thank you

Aannaw commented 3 years ago

Hi Have you solved this problem? I split my genome with the size 3.1G and then running trf, but it is not yet completed after about 14 days runnng with 1 cpu and 3 Gb. Does the programme have any options with thread to speed?

xiekunwhy commented 2 years ago

I meet the same problem, and I think I need to give up trf, and try some others like Look4TRs and Dot2dot.

Wenfei-Xian commented 1 year ago

Hi all, TRF will get stuck in the long centromere region, if you want to identify tandem repeat, especially in T2T assembly, please set a higher value for -l :)

hdashnow commented 10 months ago

Increasing the value of -l to >100 for chm13-T2T helped in my case. I tested a few different values to check their memory usage as it can get pretty high.