Closed fatagun closed 4 years ago
Thanks for the questions. As far as the "best" algorithm goes, you'll need to determine that based on your requirements. There are many algorithms here for different purposes, as outlined in the README. It depends on what you're trying to do to find the "best" algorithm.
I believe all of them should work fine on multi-line strings. But if newlines are not important to analyze, you could just .Replace("\r", "").Replace("\n", "")
on the string to turn it effectively into a single-line string before comparing it. If you encounter any bugs though please let me know.
With the N-Gram based algorithms, the parameter is the "shingle" size. So if you pass 2 with the string "ABCD", it will chunk it into AB, BC, and CD. If you pass 3, it will chunk it into ABC and BCD. It does that on both strings and compares the occurrences. Beyond that you'll need to read the research paper in the README 😄
thanks:)
Hello,
What is the best algorithm to compare multi line, ie: text area, string ?
Another question,
when creating an Ngram object, it can also take an argument, NGram(2), or NGram(4).
What is that number for? What happen when the param is 2 or 4?
Thanks.