Closed jelmervdl closed 2 years ago
Maybe worth doing a little reworking on when memory is turned into io::Item
because right now this bit of code, in marian::inits::fromItem(const io::Item &)
creates lambdas that need to copy-construct the item:
https://github.com/browsermt/marian-dev/blob/0173b751289a16c5051328f156367c5a426964cf/src/graph/node_initializers.cpp#L186-L208
If io::Item
is managed a bit higher in the stack, or entirely outside (as part of something like MemoryBundle
in bergamot-translator?) these copies could have been references.
A quicker hack solution could be something something shared_ptr, possibly entirely transparently (i.e. by hiding it inside the io::Item struct) like Qt does with its objects.
Edit: according to this profile the memcpy caused by cloning io::Item is now the single most expensive part of model loading.
That lambda is not the only place that io::Item
s are cloned. Throughout many of the functions in scorer.cpp the std::vector<io::Items>
vector is being passed as a copy argument instead of a reference. I presume due to ownership questions.
As far as I recall @graemenail fixed the load items being called multiple times back in 2020. Is this an edge case that was skipped or?
Yes, there's getYamlFromModel
which takes in an already processed item-vector. I didn't get rid of any of the other code paths for creating scorers though.
The logic in marian is that models are loaded from file once and then passed to the createrScorer. This is either via io::load
, or the mmap path. If bergamot isn't using createScorer(*args, itemVector )
then it will load twice.
This becomes very opaque very fast, as we have a hidden ptr advance performed by the function readItemData
. The current version is not particularly readable but with the current refactoring this will turn even more into writeOnly
code. I guess we are using this path with non-binarised models?
It was the getYamlFromModel that takes a void pointer that caused the second call to loadItems
: https://github.com/browsermt/marian-dev/blob/837f38588c0e36552798cf7ac4bd3a51dd346ea1/src/translator/scorers.cpp#L121
Initially I fixed it using this method, but a better fix would be for bergamot-translator to call the other createScorers
function.
If std::vector<Ptr<Scorer>> createScorers(Ptr<Options> options, const std::vector<const void*>& ptrs)
loads the model, then everything can go through std::vector<Ptr<Scorer>> createScorers(Ptr<Options> options, const std::vector<std::vector<io::Item>> models)
which doesn't have the double load.
What is the loading is bergamot calling now?
Currently the one with void pointers or just the config file (for when not loading from MemoryBundle
but fileio)
https://github.com/browsermt/bergamot-translator/blob/fe3f3982debbd88e94f1909c313d1a611d2ff1c9/src/translator/translation_model.cpp#L75-L81
Thanks!
The else
path should be fine, as from file loads the model and calls the std::vector<std::vector<io::Item>> models
variant. I think changing ptr
loading to call the same function should be sufficient? Then we can worry about #75
Just this might be sufficient:
diff --git a/src/translator/scorers.cpp b/src/translator/scorers.cpp
index f9180c5e3bdc1ae5ff66d53b3316f90709ee5905..25a7b3e25539c0e7bd46bfd017039e0205451ef8 100644
--- a/src/translator/scorers.cpp
+++ b/src/translator/scorers.cpp
@@ -112,13 +112,14 @@ std::vector<Ptr<Scorer>> createScorers(Ptr<Options> options, const std::vector<c
size_t i = 0;
for(auto ptr : ptrs) {
+ auto items = io::loadItems(ptr);
std::string fname = "F" + std::to_string(i);
// load options specific for the scorer
auto modelOptions = New<Options>(options->clone());
if(!options->get<bool>("ignore-model-config")) {
YAML::Node modelYaml;
- io::getYamlFromModel(modelYaml, "special:model.yml", ptr);
+ io::getYamlFromModel(modelYaml, "special:model.yml", items);
if(!modelYaml.IsNull()) {
LOG(info, "Loaded model config");
modelOptions->merge(modelYaml, true);
@@ -128,7 +129,7 @@ std::vector<Ptr<Scorer>> createScorers(Ptr<Options> options, const std::vector<c
}
}
- scorers.push_back(scorerByType(fname, weights[i], ptr, modelOptions));
+ scorers.push_back(scorerByType(fname, weights[i], items, modelOptions));
i++;
}
I was worried that this creates a different type of ScorerWrapper
, namely one that will not call encdec->mmap(graph, ptr)
but encdec->load(graph, items)
. But the whole mmap path seems to be disabled in this repository. This is from marian::io::binary::loadItems
:
Completely disable MMAP support for any models. We do not use it in bergamot and we hijack this codepath for binary model loading. If this is not set, we trigger node_initializers.cpp:186. This one just assigns the memory ptr to the tensor if set to true, but at the moment. We are preparing some things on demand (the bottom portion of this code). Once we stop doing that, we can use the full mmap codepath. Also when using the full mmap codepath, we need to uncomment expression_graph.h:582
That change also works. It's probably wise to make a minimal change such as that to avoid any side effects. As you say the mmap bool is hijacked to do some other processing.
I think the mmap path eventually ends up at io::mmapItems()
which is just calling binary::loadItems(ptr, items, true);
. via ScorerWraper::ScorerWraper calling (I)EncoderDecoder::mmap() calling ExpressionGraph::mmap.. but maybe I've missed something
Closing this in favour of a simpler fix in #77.
marian::createScorer()
callsloadItems()
twice: once for looking for a special:model.yml entry, and once for the actual data. The first call is very expensive since it will decode & dequantize all entries just for a simple lookup of a text file.This change copies some of the logic from
loadItems()
intogetItem()
to only process that one entry that it searches for.The alternative, passing the actual items down through
createScorer
and all the other users of the output ofloadItems()
would be better, but requires a far more intrusive change set that doesn't yield any additional performance benefit in the standard bergamot-translator use case.Shaves off about 1 sec of the loading time on my machine. (Tested with a default (debug?) build of bergamot-translator that uses bundle loading, CPU has turbo-boosting disabled.)
Edit 2: Cold start-up + translation of a single sentence with HTML is 340ms now on my machine with a release build.