argosopentech / argos-translate

Open-source offline translation library written in Python
https://www.argosopentech.com
MIT License
3.81k stars 278 forks source link

produce sourcemap of translation #372

Open milahu opened 1 year ago

milahu commented 1 year ago

source-to-source compilers usually produce sourcemaps so for each output token i can see "where does this token come from?"

sourcemaps would be useful for language-to-language translators for translating rich text formats like html, odt, docx, pdf...

to translate a rich text document, i would remove all markup feed the plain text of sentences to the translator and then use the sourcemap to reconstruct the markup

would this be possible?

google translate shows the connection between sentences such a "sourcemap of sentences" would also be useful

PJ-Finlay commented 11 months ago

If CTranslate2 has support for sourcemaps then this might be possible.

argos-translate-files supports translating odt, html, docx

https://github.com/LibreTranslate/argos-translate-files/issues/1