A lot of searches are repeated. The larger the target corpus (a few vs hundreds) the more repeated searches are performed and thus the program takes longer.
Caching search pairs would boost performance as lookups are faster than the similarity algorithms.
Proposed solution:
Make a lookup dictionary. Before computing a pair, say "cocaine" -> "the" (a very common word and thus this similarity is probably counted very frequently), peek into the dictionary to see if the pair's similarity has been computed already, if so, get the computed value and use that for the remainder of the program. If not, compute the value and store it in the dictionary before continuing.
I think this will be simpler to implement and have similar performance gains to concurrency. It has the added benefit of being WASM compatible while I am not sure what concurrency features are supported via WASM at this time.
A lot of searches are repeated. The larger the target corpus (a few vs hundreds) the more repeated searches are performed and thus the program takes longer.
Caching search pairs would boost performance as lookups are faster than the similarity algorithms.
Proposed solution:
Make a lookup dictionary. Before computing a pair, say "cocaine" -> "the" (a very common word and thus this similarity is probably counted very frequently), peek into the dictionary to see if the pair's similarity has been computed already, if so, get the computed value and use that for the remainder of the program. If not, compute the value and store it in the dictionary before continuing.
I think this will be simpler to implement and have similar performance gains to concurrency. It has the added benefit of being WASM compatible while I am not sure what concurrency features are supported via WASM at this time.
This link may prove helpful.