Mparat / scipy-cluster

Automatically exported from code.google.com/p/scipy-cluster
Other
0 stars 0 forks source link

String distances #21

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
I am interested in using this package to cluster sequences.  I noticed in
the TODO file, you list that you want to do this as well.  One place to go
is to take the implementation of the Levenshtein edit distance from the
py-editdist package.  In addition, there is a normalized edit distance that
can be easily implemented from that in this paper:
IEEE Trans Pattern Analys Mach Intel 29(6):1091
I'll see about writing it myself, but my C is quite rusty.

Original issue reported on code.google.com by uri.lase...@gmail.com on 27 Feb 2009 at 1:21

GoogleCodeExporter commented 8 years ago
Hi Uri,

I do want to include a number of edit distances but alas I'm trying to get a 
paper
out. However, I'd be willing to offer you some pointers. How good is your 
Python?
Have you considered writing in Cython?

Cython is a new, well-maintained project that has gained extraordinary momentum
within the past year. It is a major, well-documented improvement of the Pyrex
project, which is now obsolete. Cython is a Python dialect that gets translated 
into
C. Your Python functions can call Cython code very easily.

Do you know how big the alphabet tends to be for most problems using the 
Levenshtein
distance?

Btw, you can try e-mailing me at my e-mail address, first DOT last AT gmail DOT 
com.
I check this site much less often.

I will read the paper you referenced over the weekend.

Cheers,

Damian

Original comment by damian.e...@gmail.com on 27 Feb 2009 at 4:33