UK-IPOP / drug-extraction

A ToolBox for fuzzily extracting drugs mentions from text.
https://drug-extraction.vercel.app
MIT License
3 stars 0 forks source link

Implement caching in CLI #62

Closed nanthony007 closed 2 years ago

nanthony007 commented 2 years ago

A lot of searches are repeated. The larger the target corpus (a few vs hundreds) the more repeated searches are performed and thus the program takes longer.

Caching search pairs would boost performance as lookups are faster than the similarity algorithms.

Proposed solution:

Make a lookup dictionary. Before computing a pair, say "cocaine" -> "the" (a very common word and thus this similarity is probably counted very frequently), peek into the dictionary to see if the pair's similarity has been computed already, if so, get the computed value and use that for the remainder of the program. If not, compute the value and store it in the dictionary before continuing.

I think this will be simpler to implement and have similar performance gains to concurrency. It has the added benefit of being WASM compatible while I am not sure what concurrency features are supported via WASM at this time.

dictionary = HashMap

This link may prove helpful.

nanthony007 commented 2 years ago

Feature added! 67bc3a5fbab69c5523e4ebc0403a457952274dbd