Benson-Genomics-Lab / TRF

Tandem Repeats Finder: a program to analyze DNA sequences
https://tandem.bu.edu/trf/trf.html
GNU Affero General Public License v3.0
155 stars 26 forks source link

The TRF(Tandem Repeats Finder, Version 4.09) runs for an excessively long time without any results. #25

Open yukaiquan opened 2 months ago

yukaiquan commented 2 months ago

Problem Overview

image

  1. The sequence length is 470,685,350 base pairs (bp).
  2. The running time exceeds 38 days (during which I switched many servers).
  3. Software Run Log:

    Tandem Repeats Finder, Version 4.09 Copyright (C) Dr. Gary Benson 1999-2012. All rights reserved.

    Loading sequence... Allocating Memory... Initializing data structures... Computing TR Model Statistics... Scanning...

    Attempt to resolve.

  4. Even after changing servers, the situation remains the same.
  5. I am attempting to run the Tandem Repeats Finder (TRF) on smaller segments of a large sequence by splitting it at the 'N' characters and processing each segment individually. I have noticed that while most segments complete within 10 minutes, one specific segment of length 23,885,949 is taking an unusually long time to finish. This has led me to suspect that there might be a bug in TRF when processing this particular segment.

image trf chr5_74.fasta 2 7 7 80 10 50 500 -d -h chr5_74.fasta.gz

yukaiquan commented 2 months ago

After switching to version 4.10.0, I still cannot obtain the running results.

Xiaogongao commented 2 months ago

I also encountered the same problem, did you solve it?

Tandem Repeats Finder, Version 4.09 Copyright (C) Dr. Gary Benson 1999-2012. All rights reserved.

Loading sequence... Allocating Memory... Initializing data structures... Computing TR Model Statistics... Scanning...

yukaiquan commented 2 months ago

I also encountered the same problem, did you solve it?

Tandem Repeats Finder, Version 4.09 Copyright (C) Dr. Gary Benson 1999-2012. All rights reserved.

Loading sequence... Allocating Memory... Initializing data structures... Computing TR Model Statistics... Scanning...

The issue remains unresolved

jawad9-11 commented 2 months ago

Hi, i am facing the same issue. If you have solved the issue can you tell me how to do it? Thank You!

yzhernand commented 1 month ago

Hello. This may be due to a very long repeat in the input sequence (such as in centromeres, see https://tandem.bu.edu/trf/whats_new and the Changelog section of the README).

I suggest trying the '-l' option with a value of 6 to start (the default is 2). This tells TRF to expect TR arrays as long as 6 Mbp. Please note that on 32-bit platforms, the maximum possible value of -l is 3.

Note: I am no longer in Gary's lab and am no longer actively involved in TRF's development.

jawad9-11 commented 1 month ago

Thank you for your response. Regards

On Thu, Oct 10, 2024 at 3:49 AM Yozen Hernandez @.***> wrote:

Hello. This may be due to a very long repeat in the input sequence (such as in centromeres, see https://tandem.bu.edu/trf/whats_new and the Changelog section of the README).

I suggest trying the '-l' option with a value of 6 to start (the default is 2). This tells TRF to expect TR arrays as long as 6 Mbp. Please note that on 32-bit platforms, the maximum possible value of -l is 3.

Note: I am no longer in Gary's lab and am no longer actively involved in TRF's development.

— Reply to this email directly, view it on GitHub https://github.com/Benson-Genomics-Lab/TRF/issues/25#issuecomment-2403309017, or unsubscribe https://github.com/notifications/unsubscribe-auth/BH2JJVKW4OZFAYMX7VYOP7LZ2WCGPAVCNFSM6AAAAABNON3X6WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMBTGMYDSMBRG4 . You are receiving this because you commented.Message ID: @.***>

tallnuttrbgv commented 1 week ago

Same problem with plant genome.. e.g. contig is 23 Mbp, tried different -l settings and it made no difference, TRF never finishes.

tallnuttrbgv commented 1 week ago

The only solution that worked for us on a problematic contig was to split it into 1 Mbp chunks and run trf on those, then combine the result files.