Open kpu opened 3 years ago
Can you bring the relevant nonbreaking_prefixes.xx
into the archive, @XapaJIaMnu. I'll pick this up at BRT to include tests for https://github.com/browsermt/bergamot-translator/pull/172.
Where exactly do we get those from? Is that part off ssplit, @ugermann ?
They come from moses. https://github.com/moses-smt/mosesdecoder/tree/master/scripts/share/nonbreaking_prefixes
They actually ship with the sentence splitter and may diverge from Moses over time, as we add additional prefixes.
Currently bergamot-translator is just not loading non-breaking prefixes https://github.com/browsermt/bergamot-translator/issues/104 . This is bad and should be fixed. I think the clean way to do this is to ship the file for the source language. They're small enough that some copying is probably ok.