Benson-Genomics-Lab / TRF

Tandem Repeats Finder: a program to analyze DNA sequences
https://tandem.bu.edu/trf/trf.html
GNU Affero General Public License v3.0
154 stars 26 forks source link

Context dependence of calls for short repeats #10

Open plavskin opened 2 years ago

plavskin commented 2 years ago

I am having issues with TRF (version 4.09) missing multiple short homopolymers.

I've produced an example below with a (modified) sequence from the yeast genome that contains two motifs with period size 1 and a copy number of 6. Both motifs are detected in the first sequence, but single-nucleotide changes outside the repeat result in trf not reporting either the first or the second sequence, depending on the change.

The command I am calling is trf test.fa 2 7 7 80 10 8 1000 -h -d -ngs

The contents of test.fa (with the inserted nucleotides capitalized) are:

>seq_find_both
gacgtagcaatccaaaaaagggaagtctaggcgccccccaccg
>seq_miss_first
Tgacgtagcaatccaaaaaagggaagtctaggcgccccccaccg
>seq_miss_second
gacgtagcaatccaaaaaagggaagtctaggcgTccccccaccg

Is this behavior expected? Is there any way to modify it?

Thank you!

- Eugene