browsermt / bergamot-translator

Cross platform C++ library focusing on optimized machine translation on the consumer-grade device.
http://browser.mt
Mozilla Public License 2.0
327 stars 36 forks source link

Enables model ensembles #450

Closed graemenail closed 1 year ago

graemenail commented 1 year ago

Adds the ability to use ensembles of models. This supports ensembles of binary- or npz-format models, as well as mixtures of both.

When all models in the ensembles are of binary format, the load from memory path is used. Otherwise, they are loaded via the file system. Enable log-level debug for output related to this.

graemenail commented 1 year ago

emscripten was failing because I hadn't updated the bindings for WASM. I've fixed that, but only to restore support for single models. I can look at supporting ensembles here too if needed.

graemenail commented 1 year ago

The bindings only take in a single alignedmemory, so currently it just forwards that.

At the point where I'm passing multiple models via the bindings, then it's basically enabled for wasm too. I'll have a look

graemenail commented 1 year ago

I've checked the code is working after the latest changes. I used the regression test apps to check functionality

blocking \
  --log-level trace \
  --bergamot-mode test-forward-backward \
  --model-config-paths \
    ensemble_enes.yml \
    ensemble_esen.yml

Examples of the new logging messages introduced here:

When loading any npz file:

[2023-07-30 16:58:12] Encountered an npz file model.esen.npz; will use file loading for 4 models

When loading from memory:

[2023-07-30 16:58:12] Loaded model 1 of 2 from memory
[2023-07-30 16:58:12] Loaded model 2 of 2 from memory

When loading from file:

[2023-07-30 16:58:13] Loaded 4 model(s) from file
XapaJIaMnu commented 1 year ago

Just to confirm, you see the increased runtime and actual different translations/scores?

graemenail commented 1 year ago

Just to confirm, you see the increased runtime and actual different translations/scores?

Yes, there are different outputs and durations.

Model-Single:   If you use multiple models, it should be obvious that the decoding time is increasing.
real    0m4.500s
user    0m2.982s
sys     0m1.366s

Model-Ensemble:  If you use several models, it should be obvious that the decoding time increases.
real    0m15.184s
user    0m9.733s
sys     0m5.246s

(output is the resulting translation to English from a single sentence provided in the source language)

Model-Single is one of 4 teachers from the Model-Ensemble. Timing of then ensemble is roughly 4x the single. Enabling logging shows that the models are loaded as scorers.