Closed tanmaykm closed 3 years ago
I see now that copyto!
for sparse matrices is quite slow for large matrices. It should be possible to implement a different copy method optimized for the specific needs of merging DTMs in TextAnalysis. I shall send another PR with that in a bit.
This adds a few methods to help manipulate and update DocumentTermMatrix incrementally:
serialize
anddeserialize
: optimized by not serializing the column index unnecessarily - that is re-constructed from the terms vector upon deserializationprune!
: removes documents and terms from an existing index - those that would correspond to deleted documentsmerge!
: merges two instances - one being the incremental update to be applied on an existing full indexAlso relaxed
StatsBase
version requirement to include both0.32
and0.33
. This was preventing this package being used together with certain other packages.fixes #243