Caching of labels is only necessary for wikification, and in this situation there are waay more misses than hits, because we check every ngram in the document and most of these are nonsense phrases. A bloom filter would quickly get rid of all of the misses, and looking up the hits would probably be fast enough via the database.
Caching of labels is only necessary for wikification, and in this situation there are waay more misses than hits, because we check every ngram in the document and most of these are nonsense phrases. A bloom filter would quickly get rid of all of the misses, and looking up the hits would probably be fast enough via the database.