Ability to stream corpus data to LDAModel (or any other model)

Tomotopy currently loads all of documents before training, and then it trains on these documents.

However, what I find is that I have a very large corpus (about 750,000 documents) and if I want to train on a portion of these documents, I am heavily ram limited. Even loading 20,000 documents will create a situation where my scrip takes up 20GB of ram.

Gensim has the ability to stream an iterable document corpus, which makes it more scalable in terms of ram. Is there a possibility to adjust Tomotopy so that it would have a similar capability that would allow one to train on a larger dataset?

bab2min / tomotopy

Ability to stream corpus data to LDAModel (or any other model) #162