amaurycrickx / recognito

Java Speaker Recognition Framework
Apache License 2.0
191 stars 102 forks source link

Fix typos, Chebyshev Distance calculation #4

Closed brianchung808 closed 10 years ago

brianchung808 commented 10 years ago

Just some typo fixes I noticed as I read through. Also fixed the Chebyshev Distance calculation.

By the way, great library! I've been reading through it and I've been learning a bunch.

Brian

amaurycrickx commented 10 years ago

Thank you so much for the code review !

You're plain right about the Chebyshev distance calculation for vectors !

Actually, I'm considering each feature of the set to be an independent "shape" (reduced to a point in one dimensional space). This is why I'm summing the distances between each pair of points.

Admittedly, I should have used another name for the class or at least explain what I was doing in the doc :-)

Given I'm using the euclidean distance as it provides more differentiation over larger sets of voice prints (bigger distances), I guess it's time I remove this algorithm from the lib altogether...

Again thanks for the review, glad you liked it and could learn something from it !

This also gave me the opportunity to discover DejaVu. Great stuff ! I saw an online presentation some time ago which explained how he made something similar over a week-end... The second part of the talk was about Shazam lawyers contacting the guy with copyright infringement claims... freaky ! He blogged about it : http://www.royvanrijn.com/blog/2010/07/patent-infringement/ http://www.royvanrijn.com/blog/2010/11/patent-infrigement-part-2/

brianchung808 commented 10 years ago

Oh I see I didn't really look through how you were representing the features before I looked at the Chebyshev calculator!

Wow those copyright infringement claims are freaky. I can't believe they would push that hard on such a thing.

Anyways, thanks for the merge. I'll definitely be looking through more and post on the google groups if I have any questions. I'm a student currently learning about speaker recognition and this library has helped a great deal!

amaurycrickx commented 10 years ago

I'd say it has less to do with the way I represent features than with a pragmatic pov: I need more differentiation than the highest difference. I merged your code because I believe it's a correct implementation of the algorithm... and I don't use it anyways

For a very good reference book I'd recommend you get a copy of Fundamentals of Speaker Recognition - Homayoon Beigi

Please note this lib is a good start but the algorithms I use are definitely not state-of-the-art

I'm planning to implement MFCC (Mel Frequency Cepstrum Coefficients) for features extraction and use more advanced statistical models for comparisons. As I'm learning all this by myself and without much mathematical background, any help is most welcome !

brianchung808 commented 10 years ago

I'm in a similar position of little math background and learning all of this on my own for a class!

From what I've read so far about MFCC, it is typically more accurate in matching than LPC. I'm doing research on the topic currently and will definitely help if I have anything useful to contribute.