Closed tanmaykm closed 3 years ago
I haven't put much thought into this, but it seems that if we keep the same lexicon, merging the DTMs is a matter of adding the matrices... is that right? There should be a way of generating the DTM with a specified lexicon. If we want to add new words to the lexicon from the incremental corupus, things get more complex.
I am trying to implement incremental updates to a tf_idf matrix built from a corpus. There may also be a need to remove certain documents / terms from the matrix. Manipulating at the document term matrix seems to be an efficient way to do that (compared to starting from the corpus).
So, seems like creating a small incremental document term matrix with new documents and merging it with the previous full document term matrix would probably be a good way. Similarly methods to remove documents and terms from the matrix will also be useful.
It will be useful to have such methods available in this package.