Closed piotr-nowicki closed 3 years ago
Thanks @piotr-nowicki, looking into this now.
I can reproduce the results you're seeing, and I've added a unit test to match the code from the README that currently fails. Note that the README (and the code itself, including unit tests) came from the upstream Java project, just rewritten for use on .NET, so I will also have to validate if it's correct upstream as well. Thanks for bringing this to our attention, will continue to look into it.
Well that was easy to validate. Created a unit test in the upstream Java project to match their example, and it also gives the same result as .NET (so our code is in alignment) that doesn't match the comments. I'm guessing the comment in the example is incorrect.
Unit test (Java):
@Test
public void exampleFromReadme() {
// produces 0.416666
NGram twogram = new NGram(2);
assertEquals(0.416666, twogram.distance("ABCD", "ABTUIO"), 0.001);
// produces 0.97222
String s1 = "Adobe CreativeSuite 5 Master Collection from cheap 4zp";
String s2 = "Adobe CreativeSuite 5 Master Collection from cheap d1x";
NGram ngram = new NGram(4);
assertEquals(0.97222, ngram.distance(s1, s2), 0.001);
}
Unit test results (Java):
java.lang.AssertionError:
Expected :0.416666
Actual :0.5833333134651184
I'll file a bug with the upstream Java project and will update this issue once I hear back.
Thanks mate.
Later on i checked java code and it seems that also they have a bug :).
Really weird that nobody tries to execute sample data :).
I started to think that .NET Core calculates it wrong ;)
I think the comment is incorrect and that the calculated results are right. It looks like this logic was changed as part of https://github.com/tdebatty/java-string-similarity/issues/22
I tried few of algorithms but at least at least one of them are not correct. Examples provided in description returns completly not expected results.
I am using .NET Core 2.2 console application.