Dushistov / sdcv

https://dushistov.github.io/sdcv/
GNU General Public License v2.0
289 stars 42 forks source link

lookup: return all matching entries found during lookup #78

Closed cyphar closed 2 years ago

cyphar commented 2 years ago

Previously, we would just return the first entry we found that matched the requested word. This causes issues with dictionaries that have lots of entries which can be found using the same search string. In these cases, the user got a completely arbitrary word returned to them rather than the full set.

While this may seem strange, this is incredibly commonplace in Japanese and likely several other languages. In Japanese:

In both cases, these are different words in every sense of the word, and have separate headwords for each in the dictionary. Thus in order to be completely useful for such dictionaries, sdcv needs to be able to return every matching word in the dictionary.

The solution is conceptually simple -- return a set containing the indices rather than just a single index. Since every list we search is sorted (to allow binary searching), once we find one match we can just walk backwards and forwards from the match point to find the entire block of matching terms and add them to the set in linear time. A std::set is used so that we don't return duplicate results needlessly.

This solution was in practice a bit more complicated because .otf cache files require a bit more fiddling, and also the ->lookup methods are used by some callers to find the next entry if no entry was found. But on the whole it's not too drastic of a change from the previous setup.

Fixes #30 Signed-off-by: Aleksa Sarai cyphar@cyphar.com

cyphar commented 2 years ago

@Dushistov This should be ready for review now, I've added integration tests as well.

Dushistov commented 2 years ago

The change is rather big, but I don't see any obvious flaws. So let's merge it into master, and wait bug reports. Thanks you for contribution.

cyphar commented 2 years ago

Thanks! Feel free to ping me if there are any issues. 😸