LHNCBC / metamaplite

A near real-time named-entity recognizer
https://metamap.nlm.nih.gov/MetaMapLite.shtml
Other
58 stars 14 forks source link

Restore Unicode-safe lookup behavior in dictionaryBinarySearch #26

Closed stevenbedrick closed 1 year ago

stevenbedrick commented 1 year ago

In comparison step of dictionaryBinarySearch(), I propose that we go back to doing string comparison (rather than byte-level comparison). This is necessary because the binary search table's layout on disk is being done according to Java's (Unicode-aware) string comparison order, not encoded byte comparison order, so the lookup needs to match. This fixes issue #1; see issue for more complete discussion.

stevenbedrick commented 1 year ago

Great! I see that the PR was accepted, but does this need to be merged with the master branch?