Closed PrimozGodec closed 1 year ago
Merging #990 (d76acaa) into master (9c0faca) will increase coverage by
0.02%
. The diff coverage is100.00%
.:exclamation: Current head d76acaa differs from pull request most recent head 946cab6. Consider uploading reports for the commit 946cab6 to get more accurate results
Issue
Fixes https://github.com/biolab/orange3-text/issues/920 Corpus saves a dictionary (Gensim Dictionary) which is created on first need and cached. The problem with the dictionary is that it stays the same after subsampling Corpus (creating a corpus with the subset of documents) even though the number of unique tokens changes. The most problematic is that it was used to access a number of unique tokens in Corpus at different locations in the addon. The information was incorrect after the corpus was subsampled (issue in #920).
Description of changes
Since the dictionary was primarily introduced for Topic modelling purposes and topic modelling does not use it anymore, I decided to remove it from Corpus. All pieces of code that use a dictionary can be written differently.
This PR so removes the dictionary and updates all the code that uses it.
Includes