Open rfishe02 opened 6 years ago
I’d like to comment that it is acceptable to spend more time in the preprocessing stage if it improves system performance at runtime without affecting accuracy. I am in support of testing this method or storage against the current system and comparing runtime performance.
I developed an approach that uses a HashMap and a Priority Queue to improve the lookup speed of n-grams in the application.
I hashed the first half of the n-gram, and built a priority queue for the latter half of the n-gram.
For bigrams, it looks like this:
"to" --> GramFreq{ String val; int freq; }
We can look up the initial n-gram fairly quickly, then quickly select the highest n-gram from the priority queue. I think that this approach is an improvement over the initial idea I had, where we would use a linked list to store the latter half of the n-grams.
It seems fast, but it takes a while to preprocess the data and load it into the application.