Helsinki-NLP / Opus-MT

Open neural machine translation models and web services
MIT License
592 stars 71 forks source link

[RQUEST] TMX Generator - Bitextor #40

Open drkhateeb opened 3 years ago

drkhateeb commented 3 years ago

Hello developers I suggest creating a GUI for this code for creating a tool to harvest multilanguage websites to create a TMX to train MT's Kindly check this Bitextor generates translation memories from multilingual websites. https://github.com/bitextor/bitextor you may extract parallel text from this Medical website https://www.mayoclinic.org/ Eng-Arabic and other languages and train NMT, to increase the translation accuracy for testing https://webisearch.com/ Regards--

jorgtied commented 2 years ago

Integrating bitextor here would be a major task and I would rather like to keep the bitext harvesting procedures outside of the core translation service. It could be an interesting feature but sounds very complex to me.