Kalkwst / MicroLib

MIT License
3 stars 0 forks source link

:sparkles: Add Jaccard Algorithm #5

Closed Kalkwst closed 1 year ago

Kalkwst commented 1 year ago

The Jaccard similarity and Jaccard distance are measures of the similarity between two strings. The Jaccard similarity is a value between 0 and 1 that indicates how similar the two strings are, with a value of 0 indicating that the strings are completely dissimilar and a value of 1 indicating that the strings are identical. The Jaccard distance is a value between 0 and 1 that indicates how dissimilar the two strings are, with a value of 0 indicating that the strings are identical and a value of 1 indicating that the strings are completely dissimilar.

The Jaccard similarity and Jaccard distance are calculated by comparing the characters in the two strings. The Jaccard similarity is calculated by dividing the number of common characters in the two strings by the total number of unique characters in the two strings. The Jaccard distance is calculated by subtracting the Jaccard similarity from 1.

For example, consider the two strings "cat" and "bat". These strings have two common characters (the letter "a") and three unique characters (the letters "c" and "t" in the first string, and the letters "b" and "t" in the second string). The Jaccard similarity between these two strings is therefore 2/3, or about 0.67. The Jaccard distance is 1 - 0.67, or about 0.33.

The code provides a method for calculating the Jaccard similarity between two strings, left and right. The Calculate method first checks if either of the input strings is null, and throws an ArgumentNullException if either is null. If both strings are non-null, it calls the CalculateJaccardSimilarity method to compute the Jaccard similarity between the strings.

The CalculateJaccardSimilarity method first checks if either of the input strings is empty. If both are empty, it returns a similarity of 1 (i.e. the strings are identical). If either is empty but the other is not, it returns a similarity of 0 (i.e. the strings are completely dissimilar). Otherwise, it creates two sets of characters, leftSet and rightSet, representing the characters in the two strings. It then creates a third set, unionSet, representing the union of the leftSet and rightSet sets. Finally, it calculates the Jaccard similarity by dividing the size of the intersection of leftSet and rightSet by the size of unionSet. The Jaccard similarity is returned as a double-precision floating point value.