Closed zuhanit closed 1 year ago
A bit worrying point is that Ratcliff/Obershelp algorithm (difflib
uses) and Levenshtein algorithm (aka edit distance) have different notions of similarity and this change could affect suggestion quality, better or worse. difflib
also takes junk characters into account, ignoring insignificant whitespace in calculating ratio.
Entry mismatch is compile error so performance shouldn't be a top priority here. Adding dict for 35232 sound file paths can help end users but we can do better by providing structured way like Pissed("unit name")[2]
, getting rid of supplying full file path.
By the way, EncodeSound
should not raise compile error on following example:
DoActions(PlayWAV("new sound"))
MPQAddFile("new sound", open("file path", "rb").read())
We need to either postpone raising error and suggestion until end of collecting phase, or don't raise error at all (current behavior).
I will respect your opinion whether it's accepted or not. Thank you for kind answer!
The difflib library is fast to ignore in small size, it going to slower more than more comparison targets. Below is benchmark result of most close matches from eudplib dictionary.
It's ok as small as a decimal point, it takes a time when get from large dictionary like below. On the other hand, using levenshtein takes little time.
Below is benchmark result using all sounds exist in Starcraft: Remastered. Each sound have long name, and SC:R have 35232 sounds even more.
Depending on the circumstances, the difference can be even greater.
For comparison, I made
get_close_matches
similar todifflib.get_close_matches
function. The Levenshtein ratio is from python-levenshtein library.You can benchmark yourself using below.
You'll need
DefSoundDict
to run above. Becuase the content is so long, I'll upload at gist.https://gist.github.com/zuhanit/b4bbab033076f70d75626983a93a6256
If any errors, please notify to me.
Thank you!