browsermt / bergamot-translator

Cross platform C++ library focusing on optimized machine translation on the consumer-grade device.
http://browser.mt
Mozilla Public License 2.0
330 stars 37 forks source link

If input is pure punctuation or numbers, pass it through unmodified #419

Open kpu opened 2 years ago

kpu commented 2 years ago

We're getting a lot of complaints about misc punctuation turning into weird stuff. The problem is we clean out training data that's pure punctuation.

Example: https://techcrunch.com/2022/06/02/mozilla-brings-free-offline-translation-to-firefox/ complains about |

I realize this is the start of a larger component that handles exceptions for translation.

Elaborendum commented 2 years ago

Is https://github.com/mozilla/firefox-translations/issues/365 also related to this?