Add an example to compare LevenshteinDistance and SequenceMatcher
This PR will help recommend a more ideal VM OS image.
: When a user inputs "Ubuntu 22.04.4 LTS (Jammy Jellyfish) x86_64 SSD" and requests a recommendation for a VM OS image, : among many images, "ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20191002" can be recommended as an ideal image.
The details
It has been discovered that in some cases, LevenshteinDistance is not properly recommending VM OS images.
For example, when comparing '22.04' with '22.04.4', '20.04', and '18.04', '22.04.4' should be the most similar one.
But, the results were as follows.
Comparing '22.04' with '22.04.1': The similarity ratio is 0.71
Comparing '22.04' with '20.04': The similarity ratio is 0.80
Comparing '20.04' with '18.04': The similarity ratio is 0.60
SequenceMatcher can resolve this issue as follows.
Comparing '22.04' with '22.04.1': The similarity ratio is 0.83
Comparing '22.04' with '20.04': The similarity ratio is 0.60
Comparing '20.04' with '18.04': The similarity ratio is 0.60
Lastly, to exclude unnecessary similarity ratio values of short substrings, LeRU with a threshold of 0.5 was applied.
The following cases are set to 0:
'22.04.4', '18.04' (similarity: 0.50)
'lts', 'ssd' (similarity: 0.33)
'jellyfish', 'hvm' (similarity: 0.17)
'x86', 'amd64' (similarity: 0.25)
'jellyfish', 'hvm' (similarity: 0.17)
(Extra examples)
comparing 'x86_64' with 'amd64': LevenshteinDistance (0.33), SequenceMatcher (0.36)
comparing 'hvm-ssd' with 'ssd': LevenshteinDistance (0.43), SequenceMatcher (0.60)
comparing 'hvm-ssd' with 'hdd': LevenshteinDistance (0.29), SequenceMatcher (0.20)
This PR will help recommend a more ideal VM OS image. : When a user inputs "Ubuntu 22.04.4 LTS (Jammy Jellyfish) x86_64 SSD" and requests a recommendation for a VM OS image, : among many images, "ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20191002" can be recommended as an ideal image.
The details
It has been discovered that in some cases, LevenshteinDistance is not properly recommending VM OS images.
For example, when comparing '22.04' with '22.04.4', '20.04', and '18.04', '22.04.4' should be the most similar one. But, the results were as follows.
SequenceMatcher can resolve this issue as follows.
Lastly, to exclude unnecessary similarity ratio values of short substrings, LeRU with a threshold of 0.5 was applied.
(Extra examples)