Xinglab / TideHunter

TideHunter: efficient and sensitive tandem repeat detection from noisy long reads using seed-and-chain
https://github.com/yangao07/TideHunter
MIT License
19 stars 2 forks source link

Usage Inquiry #20

Open HLHsieh opened 4 months ago

HLHsieh commented 4 months ago

Hi there,

I am interested in this amazing tool. I am wondering whether there are any limitations regarding the length of each tandem repeat. Specifically, I would like to detect a 67-bp tandem repeat in our data.

Additionally, is there a way to determine the copy number of tandem repeats in a set of reads rather than in a single read?

Any suggestions would be appreciated.

Thank you, Hsin

yangao07 commented 4 months ago

Hi, thanks for being interested in the tool. By default, TideHunter tries to detect tandem repeats with a unit length of >= 30 bp, and a copy number of >= 2.

TideHunter does not consider information across reads, so everything is within a single read. For your case, it might require extra processing with the TideHunter's output to determine the copy numbers.

I don't have any suggestions for now, but if you can provide some example data, I can take a look.

HLHsieh commented 4 months ago

Hi, thanks for your help.

I uploaded a encrypted example data. These reads should contain a repeat with unit length of 48 and the copy number of ~ 100. Please let me know how to use TideHunter on my data.

https://www.dropbox.com/scl/fi/zr7prb3t67taie6ltjd43/DRD4_3.test.bam?rlkey=mstz9wz8zgeki92oullf9tngl&dl=0

According to your suggestion, TideHunter tries to detect tandem repeats with a unit length of >= 30 bp, and a copy number of >= 2. That is to say, I can not use TideHunter to detect short tandem repeats, such GGCCCC.

Many thanks, Hsin

yangao07 commented 4 months ago

For shorter tandem repeats, you can change the parameters of -p and -m to specify the shortest repeat unit length you want. For shorter ones, TideHunter's result may be wrong, but you can try.