Model serialize does not complete (or at least takes very long) on large models

FabianKaiser commented 1 year ago

When trying to serialize and save a large model (either ooa or plt, the model files are more than 1 GB large), the model.serialize() does not complete in over an hour (I did not try out how long it would take, that seemed pointless). This is weird, because when saving a model using the vowpalwabbit functionality of training (by passing the --final_regressor option), the saving is done quite fast. Loading a model of said size is a matter of seconds. Also, using the model.save(model_filename) functionality of the old vowpalwabbit package, is much faster (also in the seconds) with the same size of models.

Unfortunately, I am not sure how to pass an example. The code is quite easy

import vowpal_wabbit_next as vw
params = ['--loss_function=logistic', '--probabilities']
learned_model = open('model.vw', 'rb').read()
model = vw.Workspace(params, model_data=learned_model)
serialized_model = model.serialize()

Here the last command does not finish in any appropriate time - the memory usage keeps rising though.

I cannot pass one of our model or our training files. So I am not sure how to pass you a large model. is it possible for you, to create a dummy model to test yourself?

jackgerrits commented 1 year ago

Thanks for the bug report.

I found two large factors contributing to the slow serialization.

Vector reallocations were taking a long time. I've addressed this in #85. For my test using a 190MB model this took serialization time from 63.26s to 9.11s.
We were formatting strings for readable models even when not using them. I've addressed this in https://github.com/VowpalWabbit/vowpal_wabbit/pull/4613. This takes serialization time from 9.11s to 2.08s.

2s is still quite long, but I hope this brings it into the realm of workable for you. There may still be more opportunities for improvement here but I figured I would stop here for now as these were the low hanging fruit.

FabianKaiser commented 1 year ago

Thanks a lot already! Will it be possible for me to test it on Monday?

jackgerrits commented 1 year ago

You're welcome! I'll get all this merged today and push out a new version of the package. I'll let you know when it's released.

jackgerrits commented 1 year ago

@FabianKaiser 0.4.1 is out on pypi

FabianKaiser commented 1 year ago

Hello @jackgerrits ,

Thanks a lot!

I still have an issue though.

Models of size 4 GB or higher still do not finish for me - all others finish within one minute at most.

Do you have any idea where that might come from?

jackgerrits commented 1 year ago

Hmm, how much memory is available on the machine where you are running this?

FabianKaiser commented 1 year ago

We have 64 GB in RAM. But I also tried on a 128 GB machine, and it still didn't work. Thw model in question has qbout 11 GB on disk and something like 40 GB in RAM.

Do you know what the saving function in the old wrapper does different? Would it be possible to do a generative way and write dirctly to disk, without serializing into memory?

jackgerrits commented 1 year ago

The only difference really is that in this wrapper it creates an in-memory buffer for the serialized model, whereas the old bindings write directly to the given filename. When writing directly to disk the extra in memory buffer is therefore not needed and the memory requirements of the overall program are drastically reduced.

I had hoped to avoid a "write to file" API since the buffer is more flexible (can then write it to file), however you have very large model files and so it really feels necessary for your use case! I'll add a "write to file" kind of api and perhaps you can try a wheel of it before I go ahead and release it?

jackgerrits commented 1 year ago

@FabianKaiser Can you try the serialize_to_file method added here https://github.com/VowpalWabbit/py-vowpal-wabbit-next/pull/87?

(You can download a wheel you can install locally from the CI job for the platform you need)

FabianKaiser commented 1 year ago

Hello @jackgerrits,

Thanks a lot!

Unfortunately I could not test it - I did not manage to download the wheel. I did see the wheel was created and uploaded by the CI job, but I did not find how to access it.

Would it be possible to release it as an alpha or beta version? Or could you tell me how to access those wheels?

jackgerrits commented 1 year ago

This is the direct like to download the artifacts from the CI run: https://github.com/VowpalWabbit/py-vowpal-wabbit-next/suites/13550209781/artifacts/745571742

FabianKaiser commented 1 year ago

Hello @jackgerrits ,

The serialize_to_file function works. Could you release as a package?

Thanks again for the fast fix!

jackgerrits commented 1 year ago

There's one more feature I want to put in, then I'll do another release

jackgerrits commented 1 year ago

0.5.0 is released now

VowpalWabbit / py-vowpal-wabbit-next

Model serialize does not complete (or at least takes very long) on large models #84