Closed tmontana closed 3 years ago
It uses cython's auto-pickle functionality. I don't know the exacts of how it works, and am not sure if that's expected. Nevertheless, there's also the option of using the package's own serialization funcionality with use_cpp=True
(export_model
/ import_model
), which should definitely not increase memory usage by that much.
Indeed! Problem solved. It's also a lot faster. Thank you,
Hi. Thanks for sharing this great library.
When I pickle.dump a trained model with 200 trees memory usage exploded. Extended forest model (trained on 1mm rows and 350 columns) increases memory usage by over 30 gigs and 5 gigs on disk when finished. Note that I am using pickle protocol = 5 (with python=3.8.5). Using an earlier protocol crashed my machine (90gigs of ram) due to memory usage so was not able to pickle at all.
By contrast pickling sci-forest does not increase mem usage significantly and only takes up 10% of the disk space when done.
Is that the expected behavior? Anything I can do to optimize memory used? Many thanks,