ethereum / solidity

Solidity, the Smart Contract Programming Language
https://soliditylang.org
GNU General Public License v3.0
23.24k stars 5.76k forks source link

[DOCS] Translation Workflow for Docs #10119

Closed franzihei closed 3 years ago

franzihei commented 3 years ago

Problem

We have a lot of rather outdated (and often incomplete) translations of the Solidity docs, which are a great start but require quite a few updates and additions. Currently, these translations are not actively maintained and some of them not integrated into the readthedocs page.

Here are the rather outdated issues that tried to keep track of the translations:

(Maybe we can consider closing these once we agreed on a new approach)

Solution Approach / Ideas πŸ’‘

The ethereum.org page has a vast amount of translations which seem to be working great.

The ethereum.org team is using Crowdin to keep track of translations. You can see their project here: https://crowdin.com/project/ethereumfoundation.

They advertise the translation program here: https://ethereum.org/en/contributing/translation-program/ And here is a recent blog post on their milestone of reaching 30 languages: https://blog.ethereum.org/2020/07/29/ethdotorg-translation-milestone/

Next steps

I will set up a call with the guys from ethereum.org to learn more about how they do it. :)

franzihei commented 3 years ago

Update

This is how the translation workflow works for the ethereum.org translations:

[Translation]->[Review]->[Sync]->[Upload online]

  1. Uploading the content files on Crowdin (manually). The Progress bar shows 0%.
  2. People come to the Crowdin Project page through various channels. (Ethereum.org Translation Program page, Github, etc).
  3. People choose languages and files to translate.
  4. When the translation rate reaches 100%, we ask our professional translation company (paid reviewers) to verify the translation quality, spelling and grammar, etc.
  5. When the review is completed, download the translated file and make a GH PR.
  6. A GitHub developer who speaks the language synchronizes the translation file with the original website.
  7. When synchronization is complete, finally upload it online.

The questions / considerations for us are:

Once we agree if we want to use crowdin or not we can define the next steps.

franzihei commented 3 years ago

Questions from the standup:

franzihei commented 3 years ago

Insights into language usage:

franzihei commented 3 years ago

Read the Docs Workflow

Localizing the content would mean that the different translations would be included in the original documentation and not in separate repos. One can then select the desired language in the bottom flyout menu. image

Karocyt commented 3 years ago

French Doc State

I started the French Docs during my first year as an IT student.

I translated half of it in one go in late 2018 and only got it up to date last June (synced with release v0.6.8), but still in a partially translated state.

Do we need it

I experienced firsthand the English skills of French IT engineers, sometimes giving up on a bare Readme.
I do believe that having a French documentation pushes adoption (including in some parts of Africa) and will help bring great projects to life.
So do we (as the Ethereum ecosystem) need it, I don't know, but we (as French developers) definitely do.

Usage

I recently heard that my school was working on certifying degrees on the Ethereum blockchain, and that they were using... "the official French documentation"! I also learned that some of our students referred to my doc on a regular basis.
It doesn't sound like much, but it's crazy rewarding to know that people are indeed using it. This is part of what pushed me to bring it up to date last June, even if it created "translation holes" and involved deleting French outdated parts.

Surprisingly, no one ever complained about the "half translated" side of things.
I personally rather have a partially translated version than an outdated one, but I'm quite biased as I'm comfortable with English.

Hosting

Two guys manifested their will to contribute (#5250) but even if I added them as contributors, neither of them submitted a single commit. They never gave any sign of life but I suspect that they were disappointed to be told to contribute on some fork owned by a random student...
So whatever the solution chosen, it would be nice to bring it all back under the Ethereum Organisation umbrella to boost both involvement and visibility.

But as long as there is not enough translators on a version, we would either need a good version management system to accommodate inconsistent updates, or keep such initiatives separated/isolated as long as they can't keep up.

Workflow

Readthedocs is unclear about this, but for what I understand, the two ways of managing localization are mutually exclusive.

Keeping things on multiple repos

The centralized way of things

Crowdin

New translations

chriseth commented 3 years ago

Thanks a lot for your insights, @Karocyt ! You are probably right that incomplete translations are not a problem. I'm currently leaning towards having translations in their own repository. Those could probably be forks of the main repository and are linked via github actions or other automation tools. I'm not sure if the proposed way through .po files is really useful - a first trial showed that solidity source examples are not exported, but that may just be a setting.

Do you think that the following approach could work:

This workflow might even be compatible with using .po files or other means to do the translation in the future (machine translation etc).

if the auto-merged version cannot be compiled, this would maybe create a pull request instead.

franzihei commented 3 years ago

Thanks for your comments and insights @Karocyt! Please note that we're still in the exploration phase on how to improve the translation process. We'll try out different approaches and once we're settled on a rough process we'll update the community about it. :)

With these efforts I'm trying to streamline the translation work done, ensure high quality and consistency while also being able to provide more guidelines and structure to the community translators.


Process Update

I've been trying out transifex. This is how the interface looks like:

image

Translation is done step by step / paragraph by paragraph. This is handy for actual human translators, but not so much if we want to create a first draft using machine translations.

I'd be curious how to translate a different version using .pot files instead of .po files. Afaik those should highlight changes done between the versions?

Karocyt commented 3 years ago

@chriseth : The thing with force-accepting the remote changes is that you loose a line on each typo/rephrasing, that don't necessarily needs to be corrected on the translated version.
There is also the problem of sections permutations: when I did my big version bump, the merge was a huge mess but most of it was just about sections moving around.
Both issues could be mitigated by requiring a review/correction of the merge commit by the involved team.

If there is no way to export them, the unexported examples might be a feature more than a problem. It might be better to cut the full contracts examples (Solidity by example) into smaller snippets interspersed with the current comment text. Comments are not pleasant to translate and the long/insightful ones could be better off as their own note and/or warning insert (see the Enums one for instance).
Short informational comments could stay in English by convention, as I don't think you end up reading the Solidity documentation as a complete neophyte developer.

Manual machine translation

I leaned heavily on DeepL translator for my first draft, really amazing with contextual translations and in many cases better than what I could have done on my own.
It internally looses context on every line break, so I had to get rid of those everywhere (as in here, but the .po export seems to fix it), keep only the paragraphs and fix the reST notation and examples manually afterwards.
Most of it was proofreading then.

Transifex

To try the interface in a similar case, they auto-accept contributors on the readthedocs project.
However, I'm not a huge fan of the sentence by sentence basis. It allows to translate a sentence occurring multiple times in the project only once but doesn't seems to allow to work one file at a time (in the case of a partial/ongoing translation, I tend to focus on basics first).

axic commented 3 years ago

Screenshot

On readthedocs we also have a translation feature turned on. Clicking on zh above (for "Simplified Chinese) goes to https://solidity.readthedocs.io/zh/latest/, which does not seem to have any translation. It is apparently this repository: https://readthedocs.org/projects/solidity-zh/

I think we should remove it from the readthedocs settings: Screenshot

franzihei commented 3 years ago

Dropping new learnings here so we have them all in one place:

Example for a purely Github-driven workflow (ReactJS)

React, as in the React website and docs, are translated into several languages by community contributors. To streamline the process and ensure quality and to solve the versioning/updating issue they came up with the following workflow. (Read the full story here.)

Process followed by React (copied from this issue):

Example for including machine translations into the workflow (CPP Reference)

cppreference.com uses Machine Translations for their localized content, which can then be proof-read / corrected by community contributors in a second step: https://en.cppreference.com/w/Cppreference:MachineTranslations.

Conclusions

I've been playing with transifex and I don't think it's a good fit, it feels like overhead to me. In the optimal scenario, we could set up a purely Github-driven semi-automated process.

Could anybody check the open-source React bot and evaluate if it makes sense to re-use some functionality?

franzihei commented 3 years ago

πŸ€–

@chriseth I think the scripts for the bot can be found here: https://github.com/reactjs/reactjs.org-translation/tree/master/scripts

The React Translation bot is based on this bot: https://github.com/vuejs-jp/che-tsumi/tree/master/lib

benjioh5 commented 3 years ago

Korean translation is not updated after 2019-01-11 and no response at https://github.com/solidity-korea/solidity-docs-kr/issues/72

Could you check or ping to @solidity-korea and org members?

franzihei commented 3 years ago

Hey @benjioh5 - We are currently re-organising the translation process and unfortunately most of the currently existing translations are very outdated. We will create a new process for community translations and will let you know as soon as there is an update.

franzihei commented 3 years ago

Kicking off the new Translation Workflow! πŸŽ‰

Hello everybody!

After quite some time I'm happy to be kicking off the new translation workflow for the Solidity documentation with you!

I'm tagging all the people that seem to have been involved in translations in the past here: @kcyang @dongsam @PhyrexTsai @GoodVincentTu @bartkim0426 @fkysly @zhuangjinxin @hongbinzuo @PhyrexTsai @mrblocktw @altuntasfatih @cagataycali @mevlanaayas @msusur @onursabitsalman @denizozzgur @Karocyt @clemlak @damianoazzolini @V0nMis3s @AdrianClv @christina938 @akira-19, @NoCtrlZ, @taizo-kato

It would be amazing if some of you were still interested in contributing. :)

So where do we go from here?

Updates

Automation - Help Needed!

Optimally, we would like to have a bot similar to the reactjs-translation-bot, which would create PRs with new content that needs to be translated every time the original documentation is updated.

Currently we have nobody who can set this up, so if you have experience in setting something like this up and want to help, please reach out! :)

Your Input

If you have any questions or input, feel free to let me know by creating a topic in the Documentation category of the Solidity forum or by opening an issue in the new solidity-docs organization.

Looking forward to kicking this off with many of you, starting new translations and revamping the existing ones! πŸš€ πŸ€—