WeblateOrg / weblate

Web based localization tool with tight version control integration.
https://weblate.org/
GNU General Public License v3.0
4.52k stars 998 forks source link

Document-based markdown translation #11871

Open benmccann opened 3 months ago

benmccann commented 3 months ago

Describe the problem

Most translation tools have historically been geared towards applications such as inserting strings into a programming language. However, markdown is a fairly different use case because the idea behind it is that it's a human-readable document format.

While I'm not overly familiar with the internals, I believe that the current markdown implementation tries to split the document up into smaller po strings, translate them, and then output a markdown document from that. This process can be error prone, make it harder to see context, and is harder to implement than handling the entire document as a single string. Most importantly, it has a huge caveat highlighted in the docs:

Unlike most other formats, the changes in the translation files will not be imported to Weblate because it can not be done reliably. The source of truth for the translations is Weblate not the translated file.

Describe the solution you would like

Let translators handle the document in its entirety. Store versions of the document and provide a diff tool so that when a document is updated, translators can more easily see which portions changed.

This could either replace the current implementation or users could have the option of choosing between the two

Describe alternatives you have considered

Possibly improve the current implementation. However, it seems that there are fundamental limitations of the current approach that would make it not possible to solve some of the difficulties.

Additional context

See also issues like https://github.com/WeblateOrg/weblate/issues/10008 and https://github.com/WeblateOrg/weblate/issues/9786

nijel commented 3 months ago

Handling the whole document as a single string would need to redo the whole translation when a single sentence changes. Smaller units also make it easier to progress with the translation than translating a full document at once.

benmccann commented 3 months ago

Handling the whole document as a single string would need to redo the whole translation when a single sentence changes.

You wouldn't need to redo the entire document, but rather simply update the changed part. This is why I suggested having a diff tool integrated to show what changed

Smaller units also make it easier to progress with the translation than translating a full document at once.

True. It probably depends on how large the markdown files are. It might be reasonable enough if each one has a page or two of content. It would be more difficult if you've got a document that's ten, fifty, it a hundred pages long - though that's not the common case for a website.

Another idea might be to split the document on a heading level configurable by the user. E.g. split on ##. This would break up the document, but in an easier fashion that doesn't require a parser and that could more easily be reconstituted

github-actions[bot] commented 3 months ago

This issue has been put aside. It is currently unclear if it will ever be implemented as it seems to cover too narrow of a use case or doesn't seem to fit into Weblate.

Please try to clarify the use case or consider proposing something more generic to make it useful to more users.

nijel commented 3 months ago

Most users are happy with how Markdown is currently handled, so that is not going to change.

I'm not opposed to having another option for Markdown translating in Weblate, but I don't intend to push that myself.