Implement `getItem` as a specialised version of `loadItems`

jelmervdl commented 2 years ago

marian::createScorer() calls loadItems() twice: once for looking for a special:model.yml entry, and once for the actual data. The first call is very expensive since it will decode & dequantize all entries just for a simple lookup of a text file.

This change copies some of the logic from loadItems() into getItem() to only process that one entry that it searches for.

The alternative, passing the actual items down through createScorer and all the other users of the output of loadItems() would be better, but requires a far more intrusive change set that doesn't yield any additional performance benefit in the standard bergamot-translator use case.

Shaves off about 1 sec of the loading time on my machine. (Tested with a default (debug?) build of bergamot-translator that uses bundle loading, CPU has turbo-boosting disabled.)

Edit 2: Cold start-up + translation of a single sentence with HTML is 340ms now on my machine with a release build.

jelmervdl commented 2 years ago

Maybe worth doing a little reworking on when memory is turned into io::Item because right now this bit of code, in marian::inits::fromItem(const io::Item &) creates lambdas that need to copy-construct the item: https://github.com/browsermt/marian-dev/blob/0173b751289a16c5051328f156367c5a426964cf/src/graph/node_initializers.cpp#L186-L208

If io::Item is managed a bit higher in the stack, or entirely outside (as part of something like MemoryBundle in bergamot-translator?) these copies could have been references.

A quicker hack solution could be something something shared_ptr, possibly entirely transparently (i.e. by hiding it inside the io::Item struct) like Qt does with its objects.

Edit: according to this profile the memcpy caused by cloning io::Item is now the single most expensive part of model loading.

jelmervdl commented 2 years ago

That lambda is not the only place that io::Items are cloned. Throughout many of the functions in scorer.cpp the std::vector<io::Items> vector is being passed as a copy argument instead of a reference. I presume due to ownership questions.

XapaJIaMnu commented 2 years ago

As far as I recall @graemenail fixed the load items being called multiple times back in 2020. Is this an edge case that was skipped or?

graemenail commented 2 years ago

Yes, there's getYamlFromModel which takes in an already processed item-vector. I didn't get rid of any of the other code paths for creating scorers though.

The logic in marian is that models are loaded from file once and then passed to the createrScorer. This is either via io::load, or the mmap path. If bergamot isn't using createScorer(*args, itemVector ) then it will load twice.

XapaJIaMnu commented 2 years ago

This becomes very opaque very fast, as we have a hidden ptr advance performed by the function readItemData. The current version is not particularly readable but with the current refactoring this will turn even more into writeOnly code. I guess we are using this path with non-binarised models?

jelmervdl commented 2 years ago

It was the getYamlFromModel that takes a void pointer that caused the second call to loadItems: https://github.com/browsermt/marian-dev/blob/837f38588c0e36552798cf7ac4bd3a51dd346ea1/src/translator/scorers.cpp#L121

Initially I fixed it using this method, but a better fix would be for bergamot-translator to call the other createScorers function.

graemenail commented 2 years ago

If std::vector<Ptr<Scorer>> createScorers(Ptr<Options> options, const std::vector<const void*>& ptrs) loads the model, then everything can go through std::vector<Ptr<Scorer>> createScorers(Ptr<Options> options, const std::vector<std::vector<io::Item>> models) which doesn't have the double load.

What is the loading is bergamot calling now?

jelmervdl commented 2 years ago

Currently the one with void pointers or just the config file (for when not loading from MemoryBundle but fileio) https://github.com/browsermt/bergamot-translator/blob/fe3f3982debbd88e94f1909c313d1a611d2ff1c9/src/translator/translation_model.cpp#L75-L81

graemenail commented 2 years ago

Thanks!

The else path should be fine, as from file loads the model and calls the std::vector<std::vector<io::Item>> models variant. I think changing ptr loading to call the same function should be sufficient? Then we can worry about #75

jelmervdl commented 2 years ago

Just this might be sufficient:

diff --git a/src/translator/scorers.cpp b/src/translator/scorers.cpp
index f9180c5e3bdc1ae5ff66d53b3316f90709ee5905..25a7b3e25539c0e7bd46bfd017039e0205451ef8 100644
--- a/src/translator/scorers.cpp
+++ b/src/translator/scorers.cpp
@@ -112,13 +112,14 @@ std::vector<Ptr<Scorer>> createScorers(Ptr<Options> options, const std::vector<c

   size_t i = 0;
   for(auto ptr : ptrs) {
+    auto items = io::loadItems(ptr);
     std::string fname = "F" + std::to_string(i);

     // load options specific for the scorer
     auto modelOptions = New<Options>(options->clone());
     if(!options->get<bool>("ignore-model-config")) {
       YAML::Node modelYaml;
-      io::getYamlFromModel(modelYaml, "special:model.yml", ptr);
+      io::getYamlFromModel(modelYaml, "special:model.yml", items);
       if(!modelYaml.IsNull()) {
         LOG(info, "Loaded model config");
         modelOptions->merge(modelYaml, true);
@@ -128,7 +129,7 @@ std::vector<Ptr<Scorer>> createScorers(Ptr<Options> options, const std::vector<c
       }
     }

-    scorers.push_back(scorerByType(fname, weights[i], ptr, modelOptions));
+    scorers.push_back(scorerByType(fname, weights[i], items, modelOptions));
     i++;
   }

I was worried that this creates a different type of ScorerWrapper, namely one that will not call encdec->mmap(graph, ptr) but encdec->load(graph, items). But the whole mmap path seems to be disabled in this repository. This is from marian::io::binary::loadItems:

Completely disable MMAP support for any models. We do not use it in bergamot and we hijack this codepath for binary model loading. If this is not set, we trigger node_initializers.cpp:186. This one just assigns the memory ptr to the tensor if set to true, but at the moment. We are preparing some things on demand (the bottom portion of this code). Once we stop doing that, we can use the full mmap codepath. Also when using the full mmap codepath, we need to uncomment expression_graph.h:582

graemenail commented 2 years ago

That change also works. It's probably wise to make a minimal change such as that to avoid any side effects. As you say the mmap bool is hijacked to do some other processing.

I think the mmap path eventually ends up at io::mmapItems() which is just calling binary::loadItems(ptr, items, true);. via ScorerWraper::ScorerWraper calling (I)EncoderDecoder::mmap() calling ExpressionGraph::mmap.. but maybe I've missed something

jelmervdl commented 2 years ago

Closing this in favour of a simpler fix in #77.

browsermt / marian-dev

Implement `getItem` as a specialised version of `loadItems` #74