JS: Using supervised QE models for available language pairs

abhi-agg commented 2 years ago

Fixes: https://github.com/browsermt/bergamot-translator/issues/318

Modified the wasm test page and tested it there. It works. Supervised QE binary models are being used for 3 language pairs.

abhi-agg commented 2 years ago

@jelmervdl @jerinphilip Could one of you review the PR? It is a small one. I have to merge it today to be able to make progress on integrating supervised QE models in the extension.

jerinphilip commented 2 years ago

It works.

There are no tests accompanying this change. At least put a screenshot. How does the colour coding change? Does the visual output make sense?

I have to merge it today to be able to make progress on integrating supervised QE models in the extension.

There's nothing really stopping you from bringing the bigger PR which adds the QE feature as a whole to UI. This will additionally allow experimenting with values etc (https://github.com/browsermt/bergamot-translator/pull/370 is probably not going to work, for instance. Do we need calibration?). I fail to understand what stops you from dogfooding your own extension UI experimentation developing in a branch and bringing changes here more complete - look at https://github.com/browsermt/bergamot-translator/pull/312 for example. Alternatively, do it properly at the test page and show application examples (screenshots of UI, not console logs) - which makes an import into the extension UI perhaps easier. The larger context will help review here.

abhi-agg commented 2 years ago

There's nothing really stopping you from bringing the bigger PR which adds the QE feature as a whole to UI. This will additionally allow experimenting with values etc (#370 is probably not going to work, for instance. Do we need calibration?). I fail to understand what stops you from dogfooding your own extension UI experimentation developing in a branch and bringing changes here more complete - look at #312 for example.

Are you saying that I should complete the whole implementation in the extension first and then bring those complete changes in test page? Because this is completely opposite of the natural workflow here, making development in this repo dependent on extension repo while clearly it should be other way around.

I am looking for your review on the bindings not on the wasm/JS. Bindings for QE not being merged here does stop me from making progress on integration in the extension as the extension does depend upon the artifacts generated with this change, not the other way around.

Alternatively, do it properly at the test page and show application examples (screenshots of UI, not console logs) - which makes an import into the extension UI perhaps easier. The larger context will help review here.

I can attach a screenshot with console logs that will show that the supervised models are returning scores which are within the range [-1.0, 0] and that will prove that the bindings work. However, it is difficult to show in the test page that there is a color coding change. This depends on the model and the sentences used for translation. At least, I couldn't find a sentence with which I could show better results with supervised QE compared to the free scores from translation models.

Anyway, sanity of the scores being returned by the supervised QE models lie with the C++ realm and not JS.

jelmervdl commented 2 years ago

Worth mentioning though: this is an API breaking change for all WASM consumers out there. Which is just Mozilla and my fork of their extension. Consider them all informed, hehe.

abhi-agg commented 2 years ago

So not sure whether the binding changes are working, but they're not breaking anything.

Actually it is easy to see that bindings work because whenever supervised models are available then you will see the QE model size in the console log (which indicates that the TranslationModel construction API uses QE model aligned memory).

Attaching screenshot for reference.

Maybe the people working on QE have some reference data, e.g. an example sentence combined with one of our models that they expect a certain score for. Or some easy examples we can use to test it. E.g. I use german <-> english often to test movement. Similarly, there must be example sentences that trigger a response from the QE models we can see.

This sounds like an option for verification.

I am merging this one.

browsermt / bergamot-translator

JS: Using supervised QE models for available language pairs #378