Xinglab / TideHunter

TideHunter: efficient and sensitive tandem repeat detection from noisy long reads using seed-and-chain
https://github.com/yangao07/TideHunter
MIT License
19 stars 2 forks source link

AveMatch is 0.0 #6

Closed zztin closed 2 years ago

zztin commented 4 years ago

Issue description

The AveMatch score is 0.0 in some consensus segments, which should not be correct. Used version: v.1.4.1

Steps to reproduce the issue

  1. Example fasta file is provided here (ubuntu pastebin)
  2. Running ./bin/TideHunter produces a consensus here which has 0.0 for the AveMatch field readname: >score_0_rep0_1281_387_1281_374_2.4_0.0_0_417,791,1169
  3. Running ./bin/TideHunter -uproduces output here

    What's the expected result?

What's the actual result?

AveMatch = 0.0 full result here which has 0.0 for the AveMatch field readname: >score_0_rep0_1281_387_1281_374_2.4_0.0_0_417,791,1169

Additional findings:

I found that the consensus sequence (in step2 above) is the same as the first subunit. >score_0_rep0_sub0

Comments

If you could provide a more detailed algorithm of how you calculate the AveMatch would be nice. What is the current lower limit of the reported consensus for AveMatch? Also, if it is possible to set AveMatch score as a parameter in the command line? Thank you very much!

yangao07 commented 4 years ago

Thanks for pointing out this. I just figured out that it is a bug in the AveMatch calculation. It is fixed in the new version.

For your case, there are only 2 unit sequences to generate the consensus sequence. As no real "consensus" can be obtained from 2 sequences. Or we can say, any unit sequence can be considered as "consensus". Thus we choose to simply consider the first unit as the consensus sequence.

We calculated the AveMatch based on the alignment result between unit sequence and the consensus using the ksw2 library in the global alignment mode. The # of matched bases is provided by the ksw2 alignment result.

What do you mean by "set AveMatch score as a parameter"?

Yan