Open erip opened 1 year ago
Hi @erip
Yes, the model saved with full=True
argument has all parameters related to whole train, thus it may take a long time to re-load it. If you save the model with full=False
, you cannot continue to train with this model, but you can load faster.
Thanks very much, @bab2min! It seems like if the model is binarized it shouldn't take long to reload. I haven't looked at the details so sorry for the silly question, but does the model use numpy binarization under the hood? If so, it could be quick to deserialize even if full (though maybe I don't appreciate the complexity here).
@erip Actually, the package doesn't use numpy binarization for loading & saving, but uses a custom serialization functions. And it is true that there are many features related to backward compatibility in the custom functions, so their process is somewhat inefficiently. I'll check if it can be improved or re-implement the loading & saving in the near future.
I have a 6.5GB model trained on 10M docs to model 100 topics trained the usual way. I'm trying to load the model and I'm finding that load times are incredibly high. For reference, I've been monitoring
top
and my program has only loaded ~5.1GB of the 6.5GB model after 10 minutes.I suspect this is because I used default
save
withfull=True
... Should I expect a model withfull=False
to load faster?