A couple of questions about the work of AptaCLUSTER - Githubissues

drivenbyentropy / aptasuite

A full-featured bioinformatics software collection for the comprehensive analysis of aptamers in HT-SELEX experiments.

https://drivenbyentropy.github.io/

GNU General Public License v3.0

24 stars 11 forks source link

A couple of questions about the work of AptaCLUSTER #125

Closed CTPAHHIK38RUS closed 1 year ago

CTPAHHIK38RUS commented 1 year ago

Hi. For example, I have a sequence pool in a 500000 sequences with only a 40 nucleotide randomized region. By setting the LSH Dimension value to 20, I indicate that sequences with a common 20 nucleotide sequence should be found, right? And by specifying Edit Distance, do I set the number of substitutions within this area or in the whole sequence? Kmer is the length into which the sequence is cut and they are already compared with each other as letters to find the correspondence between the sequences. If I'm not mistaken, the same principle is implemented in BLAST, right?

CTPAHHIK38RUS commented 1 year ago

And when reducing, for example, the LSH Dimension from 30 to 20, do you need to increase the Edit distance from 10 to 20?

CTPAHHIK38RUS commented 1 year ago

I also noticed that AptaCLUSTER doesn't like to overwrite several variants in a row. You have to open the configuration file and recalculate it at a time. I.e. if you perform several (2 or more) calculations with an existing one, it will most likely not complete it.

CTPAHHIK38RUS commented 1 year ago

Hello. Is there any way to export the cluster consensus sequence?

CTPAHHIK38RUS commented 1 year ago

OK, now I understand that by setting the LSH value, we specify the maximum possible number of substitutions between a seed-sequence and another sequence