I would recommend refactoring jaccard.py as follows:
Have the jacard() function take two token sets as arguments and compute their jaccard similarity (overlap percentage). Checking for empty token bags should be done here.
Have the compare() function deal with the tokenization and anything else (e.g. transforming the distance to a similarity).
The check for null case should be done at the token bag level rather than the string level:
https://github.com/OlivierBinette/StringCompare/blob/be58f4c1c9c24bc2cef5d9bb81053fa7ea003792/stringcompare/distance/jaccard.py#L17
I would recommend refactoring jaccard.py as follows:
jacard()
function take two token sets as arguments and compute their jaccard similarity (overlap percentage). Checking for empty token bags should be done here.compare()
function deal with the tokenization and anything else (e.g. transforming the distance to a similarity).