XapaJIaMnu / translateLocally

Fast and secure translation on your local machine, powered by marian and Bergamot.
MIT License
501 stars 28 forks source link

Problems with the windows build #88

Open jorgtied opened 2 years ago

jorgtied commented 2 years ago

We did a quick test on a windows 10 machine and got a problem with downloading models. It says that the cryptographic hash does not match. We tried various language pairs. I also tried my own fork and our windows build and for that it does work to download the models but the translations are non-sense. It's just random output that has nothing to do with the input. Did anyone see a similar behavior? I also tested the builds for Mac OS and they work without problems and translations look reasonable.

XapaJIaMnu commented 2 years ago

The cryptographic hash not matching is likely #79 which is fixed, just requires a redeploy of the online models. I will try to fix this today.

TranslateLocally can recognise models in directories in the same folder as the executable. Try manually extracting the model to see if it works.

XapaJIaMnu commented 2 years ago

I just tested it, our models work when downloaded manually, the download issue is just due to #79. Would you like to send us a model that doesn't work?

jorgtied commented 2 years ago

I didn't try this yet with your build but on my fork the following model produces non-sense on Windows but works well with the Mac OS build (English-to-Finnish) https://object.pouta.csc.fi/OPUS-MT-models/app/models/eng-fin.tatoeba.tiny.tar.gz (the same happens with this one: https://object.pouta.csc.fi/OPUS-MT-models/app/models/swe-fin.transformer-tiny11.tar.gz (Swedish to Finnish))

jelmervdl commented 2 years ago

Another symptom, probably related: on Mac OS (and Windows as well) when I switch from one of TranslateLocally's own models to the English->Finnish model, it crashes. Same when I switch from the English->Finnish model to another model.

Error when switching to the model:

* thread #34, stop reason = signal SIGABRT
  * frame #0: 0x00007ff81e5cb112 libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007ff81e601214 libsystem_pthread.dylib`pthread_kill + 263
    frame #2: 0x00007ff81e54dd10 libsystem_c.dylib`abort + 123
    frame #3: 0x00007ff81e5be0b2 libc++abi.dylib`abort_message + 241
    frame #4: 0x00007ff81e5bd4fd libc++abi.dylib`std::__terminate(void (*)()) + 46
    frame #5: 0x00007ff81e5bfd55 libc++abi.dylib`__cxxabiv1::failed_throw(__cxxabiv1::__cxa_exception*) + 27
    frame #6: 0x00007ff81e5bfd1c libc++abi.dylib`__cxa_throw + 116
    frame #7: 0x000000010038ce88 translateLocally`marian::cpu::integer::fetchAlphaFromModelNodeOp::forwardOps()::'lambda'()::operator()() const + 1768
    frame #8: 0x00000001003c7a1f translateLocally`marian::rnn::GRUFastNodeOp::runBackward(std::__1::vector<std::__1::function<void ()>, std::__1::allocator<std::__1::function<void ()> > > const&) + 47
    frame #9: 0x00000001003c5217 translateLocally`marian::Node::forward() + 71
    frame #10: 0x00000001002d15d9 translateLocally`marian::ExpressionGraph::forward(std::__1::list<IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase> > >, std::__1::allocator<IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase> > > > >&, bool) + 201
    frame #11: 0x00000001002d1495 translateLocally`marian::ExpressionGraph::forwardNext() + 997
    frame #12: 0x00000001004ee32f translateLocally`marian::BeamSearch::search(std::__1::shared_ptr<marian::ExpressionGraph>, std::__1::shared_ptr<marian::data::CorpusBatch>) + 10959
    frame #13: 0x0000000100103444 translateLocally`marian::bergamot::TranslationModel::translateBatch(unsigned long, marian::bergamot::Batch&) + 308
    frame #14: 0x0000000100133323 translateLocally`void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, marian::bergamot::AsyncService::AsyncService(marian::bergamot::AsyncService::Config const&)::$_2> >(void*) + 115
    frame #15: 0x00007ff81e6014f4 libsystem_pthread.dylib`_pthread_start + 125
    frame #16: 0x00007ff81e5fd00f libsystem_pthread.dylib`thread_start + 15

Error when switching away from the model:

* thread #37, stop reason = signal SIGABRT
  * frame #0: 0x00007ff81e5cb112 libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007ff81e601214 libsystem_pthread.dylib`pthread_kill + 263
    frame #2: 0x00007ff81e54dd10 libsystem_c.dylib`abort + 123
    frame #3: 0x00007ff81e5be0b2 libc++abi.dylib`abort_message + 241
    frame #4: 0x00007ff81e5bd4fd libc++abi.dylib`std::__terminate(void (*)()) + 46
    frame #5: 0x00007ff81e5bfd55 libc++abi.dylib`__cxxabiv1::failed_throw(__cxxabiv1::__cxa_exception*) + 27
    frame #6: 0x00007ff81e5bfd1c libc++abi.dylib`__cxa_throw + 116
    frame #7: 0x0000000100394e05 translateLocally`marian::cpu::integer::PrepareBiasForBNodeOp::PrepareBiasForBNodeOp(IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase> > >, IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase> > >, IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase> > >, IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase> > >) + 2293
    frame #8: 0x000000010038bd8b translateLocally`IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase> > > marian::Expression<marian::cpu::integer::PrepareBiasForBNodeOp, IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase> > >&, IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase> > >&, IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase> > >&, IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase> > >&>(IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase> > >&, IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase> > >&, IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase> > >&, IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase> > >&) + 139
    frame #9: 0x000000010030d8d8 translateLocally`IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase> > > marian::cpu::integer::affine<(marian::Type)257>(IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase> > >, IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase> > >, IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase> > >, bool, bool, float, float, bool) + 1016
    frame #10: 0x000000010030be0e translateLocally`marian::affine(IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase> > >, IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase> > >, IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase> > >, bool, bool, float) + 1038
    frame #11: 0x0000000100403132 translateLocally`marian::mlp::Output::applyAsLogits(IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase> > >)::$_0::operator()(IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase> > >, IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase> > >, IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase> > >, bool, bool) const + 82
    frame #12: 0x0000000100402995 translateLocally`marian::mlp::Output::applyAsLogits(IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase> > >)::$_1::operator()(IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase> > >, IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase> > >, IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase> > >, bool, bool) const + 261
    frame #13: 0x00000001003fe625 translateLocally`marian::mlp::Output::applyAsLogits(IntrusivePtr<marian::Chainable<IntrusivePtr<marian::TensorBase> > >) + 14597
    frame #14: 0x00000001004bc340 translateLocally`marian::DecoderTransformer::step(std::__1::shared_ptr<marian::DecoderState>) + 9424
    frame #15: 0x00000001004b7c7d translateLocally`marian::DecoderTransformer::step(std::__1::shared_ptr<marian::ExpressionGraph>, std::__1::shared_ptr<marian::DecoderState>) + 109
    frame #16: 0x00000001004e0e80 translateLocally`marian::EncoderDecoder::step(std::__1::shared_ptr<marian::ExpressionGraph>, std::__1::shared_ptr<marian::DecoderState>, std::__1::vector<unsigned int, std::__1::allocator<unsigned int> > const&, std::__1::vector<marian::Word, std::__1::allocator<marian::Word> > const&, std::__1::vector<unsigned int, std::__1::allocator<unsigned int> > const&, int) + 480
    frame #17: 0x00000001004cee1d translateLocally`marian::models::Stepwise::step(std::__1::shared_ptr<marian::ExpressionGraph>, std::__1::shared_ptr<marian::DecoderState>, std::__1::vector<unsigned int, std::__1::allocator<unsigned int> > const&, std::__1::vector<marian::Word, std::__1::allocator<marian::Word> > const&, std::__1::vector<unsigned int, std::__1::allocator<unsigned int> > const&, int) + 109
    frame #18: 0x0000000100507135 translateLocally`marian::ScorerWrapper::step(std::__1::shared_ptr<marian::ExpressionGraph>, std::__1::shared_ptr<marian::ScorerState>, std::__1::vector<unsigned int, std::__1::allocator<unsigned int> > const&, std::__1::vector<marian::Word, std::__1::allocator<marian::Word> > const&, std::__1::vector<unsigned int, std::__1::allocator<unsigned int> > const&, int) + 245
    frame #19: 0x00000001004edbe8 translateLocally`marian::BeamSearch::search(std::__1::shared_ptr<marian::ExpressionGraph>, std::__1::shared_ptr<marian::data::CorpusBatch>) + 9096
    frame #20: 0x0000000100103444 translateLocally`marian::bergamot::TranslationModel::translateBatch(unsigned long, marian::bergamot::Batch&) + 308
    frame #21: 0x0000000100133323 translateLocally`void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, marian::bergamot::AsyncService::AsyncService(marian::bergamot::AsyncService::Config const&)::$_2> >(void*) + 115
    frame #22: 0x00007ff81e6014f4 libsystem_pthread.dylib`_pthread_start + 125
    frame #23: 0x00007ff81e5fd00f libsystem_pthread.dylib`thread_start + 15

The model also doesn't really work for me: image (Google Translate says that's "my value is anything here"?)

XapaJIaMnu commented 2 years ago

I was just about to open a new bug report with this. Same behaviour on Linux, the model crashes. I suspect this is due to the dynamic model swap framework that doesn't allow for models with different configurations to be changed. Upstream bug @jerinphilip ? Even when deleting all other models so that the English-Finnish is loaded first (and therefore there's no swapping issues), I still can't get it to translate anything right, even on Linux.

Looking at the model it has two separate vocabularies. I guess the new bergamot-translator might not support separate vocabularies, and this model has separate vocabularies. @jerinphilip ?

XapaJIaMnu commented 2 years ago

As for the problem of downloading models, the QT version used for windows has some bug, where it doesn't do well with redirection (eg http->https) We fixed that by making sure there are no redirects when downloading the model. This will eventually be fixed when the QT6 windows build starts working (vcpkg issues...)

jorgtied commented 2 years ago

Strange, I can perfectly translate with the English-Finnish model on my Mac laptop with that model and my own build from this fork: https://github.com/Helsinki-NLP/OPUS-MT-app/ Importing a manual download into translateLocally crashes ...

jerinphilip commented 2 years ago

Even when deleting all other models so that the English-Finnish is loaded first (and therefore there's no swapping issues), I still can't get it to translate anything right, even on Linux.

Clean runs appear to work for me, for Swedish-Finnish. I can't read Finnish, but the output looks weird for English-Finnish, and potentially incorrect for the Finnish model. Not so sure about the other one.

image

From this notebook.

I'll try to look into the other situation (crashes on swapping models) alongside multiple model improvements, which I expect to take on soon.

when I switch from one of TranslateLocally's own models to the English->Finnish model, it crashes. Same when I switch from the English->Finnish model to another model.

Is translateLocally using the model-swap provided by upstream bergamot-translator now? I was under the impression translateLocally is restarting Service as a whole. Does it work for swaps on browsermt provided models?

jerinphilip commented 2 years ago
ARVO.ARVO!ARVO!ARVO!
ARVO KE<i>H</i>EN <i></i>KEHY JOKA KEH KEHY .ARVO <i></i><i></i>JOKA <i></i>.ARVO <i></i><i></i>JOKA .ARVO <i></i><i></i>JOKA .ARVO <i></i><i></i>JOKA JOKA .ARVO <i></i>

Symptoms above appear consistent with a wrong vocabulary keyed in. Trying to access something out of bounds with missing vocabulary could be causing the segfault?

XapaJIaMnu commented 2 years ago

@jerinphilip It seems I am wrong, this is not a bergamot-translator issue, the model doesn't work even with browsermt/marian-dev @jorgtied could it be that this model was trained with a different marian fork?

XapaJIaMnu commented 2 years ago

@jorgtied it seems that changing gemm-precision: int8shift to gemm-precision: int8 makes the model work. I will have to investigate, but this is a browsermt issue. Since your model doesn't have precomputed alphas, int8 should give you about the same performance as int8shift.

jorgtied commented 2 years ago

OK - good to know. The models where trained with the original marian-dev but quantised with the browsermt branch of marian-dev. Is that a problem?

kpu commented 2 years ago

Training with original marian-dev and quantizing with browsermt should be fine and is the recommended path.

XapaJIaMnu commented 2 years ago

I'm working on it, hope to roll out a hotfix tonight.

jorgtied commented 2 years ago

Precomputing alphas makes only sense in connection with finetuned quantisation, right? Or is that also useful for non-tuned models?

kpu commented 2 years ago

Precomputing alphas makes only sense in connection with finetuned quantisation, right? Or is that also useful for non-tuned models?

These are orthogonal.

Precomputing alphas is just recording the typical range of values of activations and always using that scaling factor instead of setting the scaling factor on the fly. It will always damage quality somewhat, in return for not having to computing scaling at runtime.

Finetuning mucks with the floats in an attempt to limit damage from quantization though, as you have observed, sometimes it makes things worse. The finetuning happens with an emulated quantization (i.e. it uses floats, just with limited values) that I think always determines the scaling factor on the fly.

jorgtied commented 2 years ago

OK - understood. So, computing alphas will already help to avoid the model to crash even if I don't do finetuning now, right. I'll keep that in mind ... Additional question: does it make a difference to extract alphas with or without lexical shortlists?

XapaJIaMnu commented 2 years ago

@jorgtied could you share the original model.npz and the training configuration? I tested our models and they work with those config options, whereas yours refuses and I don't know why ;/

jorgtied commented 2 years ago

I think everything you need should be in here (besides of the data - would you need those as well?): https://object.pouta.csc.fi/OPUS-MT-models/swe-fin/opusTCv20210807+nopar+ft95-2022-01-19.zip (that's for the Swedish-Finnish model)

XapaJIaMnu commented 2 years ago

I can't access that link:

This XML file does not appear to have any style information associated with it. The document tree is shown below.
<Error>
<Code>NoSuchKey</Code>
<BucketName>OPUS-MT-models</BucketName>
<RequestId>tx00000000000000024e7c6-0061eea381-26ead993-allas-prod-kaj</RequestId>
<HostId>26ead993-allas-prod-kaj-cpouta-production</HostId>
</Error>
jorgtied commented 2 years ago

Sorry, this is the correct link: https://object.pouta.csc.fi/Tatoeba-MT-models/swe-fin/opusTCv20210807+nopar+ft95-2022-01-19.zip

XapaJIaMnu commented 2 years ago

Sorry, could you give me the English-Finnish one, it's easier to work with for us.

Also the Swedish-Finnish model doesn't exhibit the issue (i tried several parameter combinations such as gemm-precision: int8/int8shift/int8shiftAll and it works with all of them). Only the English-Finnish is broken with int8shift.

jorgtied commented 2 years ago

I messed up my experiments and could not really find the original model anymore. Instead I created a fresh version and maybe we can simply verify that this one works? It would be this quantised version: https://object.pouta.csc.fi/OPUS-MT-models/app/models/eng-fin.transformer-tiny11.tar.gz and the original one is in https://object.pouta.csc.fi/Tatoeba-MT-models/eng-fin/opusTCv20210807+nopar+ft95-sepvoc_transformer-tiny11-align_2022-01-25.zip

XapaJIaMnu commented 2 years ago

I can no longer reproduce the crash with your newer model. It works with both int8 int8shift and int8shiftAll (didn't test alphas, as the model has no alphas as far as I understand.

The issue with the previous model was with the two input and output embedding matrices. Disabling the shifted codepath for them made the model work, but I have no idea what was wrong with them. At any rate, the new model doesn't exhibit this issue. Does it work for you in Windows?

jorgtied commented 2 years ago

No it doesn't have alphas but it also has two different vocabularies in source and target. But maybe there was something wrong with my spm files for the old model but the weird thing is that i could use it without problems on my local mac laptop. I don't have windows available but I can ask my daughter later today again to test on her machine. Yesterday, we tried the Swedish-Finnish model on her laptop with my fork of translateLocally and it still produced garbage (even when moving to int8). I'll try again and report later ...

XapaJIaMnu commented 2 years ago

I just managed to get to a windows machine and tested your models. They both produce crap on Windows and I'm puzzled... Will update you later.

jorgtied commented 2 years ago

I created another model with a joint vocabulary. Does this have the same problems on Windows? https://object.pouta.csc.fi/OPUS-MT-models/app/models/eng-fin.transformer-tiny11-jointvocab.tar.gz

XapaJIaMnu commented 2 years ago

Still broken on Windows.... Can you share training data and training script so we can try to reproduce it. Something is very weird.

kpu commented 2 years ago

Do we need to go all the way back to the training data??

jorgtied commented 2 years ago

Here is all the training and validation data for the last model with joint vocabularies: https://object.pouta.csc.fi/Tatoeba-MT-models/engfin-jointvocab.tar The training command is also part of the tarfile (in the logfile engfintrain-and-eval.out.960712). Something that might be non-standard in my setup is that I segment the data outside of Marian and use regular vocab-files to invoke training (instead of using the in-built sentencepiece library). But that should not really cause this strange behavior, should it?