Open xiekunwhy opened 2 years ago
Thanks for trying out MeShClust3.
First, I believe 20 bp is too short to be a MITE. I would recommend removing short sequences (perhaps < 50 base pairs). Second, I would recommend sorting the input sequences by length. Then I'd divide them into groups (< 1000, 1000–5000, 5000-10000, etc). The interval size does not need to be 5k bp; it can be 100k bp or longer. After that, I'd cluster each group separately.
Please keep me posted.
Best regards.
Hi,
I was used meshclust3 to cluster repeat sequences (including mite and tir), I got many warnings like "Statistician warning at harmonicMeanRSimilarity. A sequence is too short. Similarity is assigned zero.". The sequences length in input file range from 100bp to ~1Mb (some times range from 20bp to 3Mb).
Will these warnings affect results, and how to avoid these warnings?
Best, Kun