facebook / docusaurus

Easy to maintain open source documentation websites.
https://docusaurus.io
MIT License
55.96k stars 8.4k forks source link

Translation workflow unclear #8703

Open Zenahr opened 1 year ago

Zenahr commented 1 year ago

Have you read the Contributing Guidelines on issues?

Description

The docs on translation: https://docusaurus.io/docs/api/plugins/@docusaurus/plugin-content-docs#i18n are a bit unclear on what to do with markdown files.

Reading through it, you'd expect that docusaurus write-translations generates not just JSON translation files, but also copy/paste Markdown files into the corresponding locale folder under /i18n.

EDIT: just found the relevant piece but only after having it pointed out to me: https://docusaurus.io/docs/i18n/tutorial#translate-markdown-files

At least for me this happened because when reading the translation docs top to bottom, you'd assume the command yarn write-translations --locale fr doesn't just create translations for code, but also markdown. Neither the command name (think something like write-translations-json) nor the docs at that point state that you need to run a separate command to handle markdown files.

Two suggestions:

  1. Add a How-To Guide for previewing and generating translations for a language, strictly showcasing the steps needed to take. If we take the the https://diataxis.fr/ framework as reference, a guide lists the steps to take, whereas a tutorial educates a reader through a scenario, which IMO the current tutorial does really well

  2. Add a callout right after the yarn write-translations --locale fr command is mentioned here: https://docusaurus.io/docs/i18n/tutorial#translate-plugin-data, stating something like: "This only creates JSON translation files. To handle Markdown, keep reading."

And I know that it should be understood just based on the heading "Translate plugin data". When I read that however, in my head I was thinking "docs plugin data... of course this entails markdown too!".

I'll gladly take this upon myself and create a PR if this is something that you're not entirely opposed to. The Guide would be small in size, meaning it would add only little maintenance costs for both updating and translation.

Self-service

slorber commented 1 year ago

Reading through it, you'd expect that docusaurus write-translations generates not just JSON translation files, but also copy/paste Markdown files into the corresponding locale folder under /i18n.

I can understand how the DX of translating things in Docusaurus is not ideal, and we'll try to improve

Yes the CLI could create the initial copy of Markdown files, but there are things to consider:

Eventually, we could prompt users to help them make a choice, and provide many extra options for those that want no prompt? (a bit like the swizzle CLI)

Add a How-To Guide for previewing and generating translations for a language, strictly showcasing the steps needed to take. If we take the the diataxis.fr framework as reference, a guide lists the steps to take, whereas a tutorial educates a reader through a scenario, which IMO the current tutorial does really well

The tutorial is a step by step how-to guide that I actually executed multiple times myself to be sure that each step is valid when executed sequentially. Make sure to not skip any step. The markdown copy instructions are present: https://docusaurus.io/docs/i18n/tutorial#translate-markdown-files

I'm not sure to understand what should be present in this how to guide exactly and where it would be located. If you have any ideas please submit a draft PR so that I understand better what you'd like to see in our docs.

Note we already have an i18n intro describing an overview of the translation workflow: https://docusaurus.io/docs/i18n/introduction#translation-workflow

CleanShot 2023-02-23 at 13 22 07@2x

Add a callout right after the yarn write-translations --locale fr command is mentioned here: docusaurus.io/docs/i18n/tutorial#translate-plugin-data, stating something like: "This only creates JSON translation files. To handle Markdown, keep reading."

Anywhere this CLI is mentioned in the docs, the text only mentions JSON files already. If you read the docs again, more carefully this time, do you still think it's needed?

I'm not against an extra callout, but I just feel that if users do not read the docs, they might as well avoid reading the callout and open an issue here.


For me, the real problem is probably that the users have the wrong intuition that the CLI is supposed to handle Markdown files as well (is not totally wrong, we should support that).

Maybe it is time to implement the missing CLI features, would you want to work on it?

Until then, we could print a big message after running the CLI to say that the CLI does not copy Markdown files yet.

Zenahr commented 1 year ago

@slorber The idea of displaying a CLI message when running the write-translations command is IMO even much better than what I proposed initially.

I think you're spot on with the intuition being that markdown files would also be managed by the command that generates just the JSON files.

Yes, I'd happily give the CLI expansion a try. Can't promise much as my programming days are a bit beyond me but I could use the refresher 😄.

Is there an RFC I could build upon? I was thinking a bit about the merging strategy when the source files for already translated markdown changes. We might be able to hook into git version control for that and look for diffs. Still lots to explore.

Maybe we could aim to get extension released for Docusaurus v3. My feeling is that improving the translation like that will be a bigger undertaking...

slorber commented 1 year ago

Thanks, let me know if you can (or not) submit a PR

Is there an RFC I could build upon

No, but we can discuss the details here and create an issue later once we are ready to start working on this. I doubt this kind of RFC would receive much feedback though but we'll see.

I was thinking a bit about the merging strategy when the source files for already translated markdown changes. We might be able to hook into git version control for that and look for diffs. Still lots to explore.

I don't know if git is needed here. What I would do is prompt the user and ask if they prefer merge or override strategy. Merge means you just copy new files but do not modify existing ones at all (this also means that you'll have to maintain the existing markdown files up to date yourself, hard to automate this unfortunately).

Due to the modular nature of Docusaurus, maybe implementing the copy thing would require a new lifecycle? That seems annoyingly complex, but the docs Markdown content is not in a single folder when using versioning, it's probably not that simple to do the copying directly in core.

Maybe we could aim to get extension released for Docusaurus v3. My feeling is that improving the translation like that will be a bigger undertaking...

We'll see after discussing this here.

What we can already do in v2.x is to add the console message giving better explanations to the users in case they are confused.


Important note: I really want to allow docs/myDoc.mdx + docs/myDoc.<locale>.mdx in the future. Maybe adding Markdown support now would become quite useless once we have this new feature considering it would be easier to create localized Markdown copies in the original folder?

Zenahr commented 1 year ago

We're talking about 3 distinct topics in this issue. I suggested a RFC for designing an "intelligent" translation CLI that does exactly this:

I was thinking a bit about the merging strategy when the source files for already translated markdown changes. We might be able to hook into git version control for that and look for diffs. Still lots to explore.

Meaning in practice: I have already translated an md file, made changes on the source md file and now I want to translate the new changes -> the CLI would then check if the corresponding file exists and if it does, approximate what lines need to be updated and keep the translated bits in place but put whatever was added in the source file into the translation file.

(this also means that you'll have to maintain the existing markdown files up to date yourself, hard to automate this unfortunately).

I do believe we can do better than that. Although this will be a non-trivial undertaking I believe, hence the separate RFC. This would make Docusaurus even more appealing too however.

Important note: I really want to allow docs/myDoc.mdx + docs/myDoc..mdx in the future. Maybe adding Markdown support now would become quite useless once we have this new feature considering it would be easier to create localized Markdown copies in the original folder?

At least to me as a tech writer I won't use that I believe. Imagine you've got a folder with around 30 doc files. And now you'd add translations for 3 languages = 120 files. That's way too crowded IMO. I do appreciate the current /i18n folder design. But I guess that's up to discussion and exploration. Thanks for the heads up on that.

What we can already do in v2.x is to add the console message giving better explanations to the users in case they are confused.

Agreed, I'll see and get to making a PR on that. Any pointers on what file(s) to look it? It's probably self-evident in case the CLI command logs are handled in just one place. I'll take a look around 😏

Due to the modular nature of Docusaurus, maybe implementing the copy thing would require a new lifecycle? That seems annoyingly complex, but the docs Markdown content is not in a single folder when using versioning, it's probably not that simple to do the copying directly in core.

Not sure, I would imagine we could extend the implementation of the write-translations CLI command to also handle markdown, or add options via flags, e.g., --handle-markdown or something to that effect. Adding Markdown into the mix, we'd have to think about both multi-instance setups and versioning.

On versioning: Might make sense to allow usage via something like docusaurus write-translations -version "1.1.2"

slorber commented 1 year ago

Meaning in practice: I have already translated an md file, made changes on the source md file and now I want to translate the new changes -> the CLI would then check if the corresponding file exists and if it does, approximate what lines need to be updated and keep the translated bits in place but put whatever was added in the source file into the translation file.

I do believe we can do better than that. Although this will be a non-trivial undertaking I believe, hence the separate RFC. This would make Docusaurus even more appealing too however.

I think you underestimate the complexity to make it work in a reliable way for a variety of source/target languages. I don't know how to build this and it's unlikely to become a priority for me in the future.

It would require that we save somewhere each source/translated docs version to be able to know when a sentence is inserted/removed and sync the related docs: this requires a storage system. Dedicated software like Crowdin exists to solve this problem already, and we can already see that for a monetized company dedicated to this problem space, solving this correctly is complex and we often find bugs and shortcomings.

A tool like this would require a lot of work and be valuable for many other git-based docs projects using Markdown, not just Docusaurus. It seems very unlikely that we'll work on it on our own: if you want to see it happen anytime soon, the best option is probably to work on it yourself unfortunately 🤪.

At least to me as a tech writer I won't use that I believe. Imagine you've got a folder with around 30 doc files. And now you'd add translations for 3 languages = 120 files. That's way too crowded IMO.

No need to have one giant folder with all the source/translations: you can have 1 doc inside each folder and have index.mdx + index.fr.mdx in the same folder.

Agreed, I'll see and get to making a PR on that. Any pointers on what file(s) to look it? It's probably self-evident in case the CLI command logs are handled in just one place. I'll take a look around 😏

The CLI entrypoint is here: https://github.com/facebook/docusaurus/blob/main/packages/docusaurus/src/commands/writeTranslations.ts

Due to the modular nature of Docusaurus, maybe implementing the copy thing would require a new lifecycle? That seems annoyingly complex, but the docs Markdown content is not in a single folder when using versioning, it's probably not that simple to do the copying directly in core.

Not sure, I would imagine we could extend the implementation of the write-translations CLI command to also handle markdown, or add options via flags, e.g., --handle-markdown or something to that effect. Adding Markdown into the mix, we'd have to think about both multi-instance setups and versioning.

What does handle Markdown even means?

Docusaurus core can do a copy/paste, but which from/to folder should it use for the copy/paste operations?

Docusarus core only sees a list of plugins with lifecycle functions that can be called. Some plugins might not even need or support Markdown, and there's no unique file-system convention across core and third-party plugins.

On versioning: Might make sense to allow usage via something like docusaurus write-translations -version "1.1.2"

Docusaurus write-translations is a core CLI: it can only call plugin lifecycles, it does not know about the existence of docs, so it does not know about versioning.

The only CLI command that knows about the concept of version is not a Docusaurus core CLI command, but is a CLI command registered by the docs plugin (plugins can do that!): yarn docusaurus docs:version 1.1.0

What I mean is we have 2 choices

1) add a new plugin lifecycle and call it when we run docusaurus write-translations. The lifecycle allow each plugin to decide what to do with Markdown files using its own local conventions.

2) each content plugin handling the Markdown file registers its own dedicated CLI command and can execute the code it wants

yarn docusaurus docs:i18n:syncMarkdown --locale fr
yarn docusaurus blog:i18n:syncMarkdown --locale fr
yarn docusaurus pages:i18n:syncMarkdown --locale fr

Hope I explained better why it's not so simple due to the modular nature:

ronilaukkarinen commented 1 year ago

I do not understand how to translate index.md to Finnish. I have tried placing index.fi.md to the same src/pages dir and i18n/fi/index.md etc, but no avail. Any tips appreciated.

The tip "put the translation files at the correct filesystem location" is not exactly clear in the docs. Been bashing my head to the wall for hour now.

slorber commented 1 year ago

I do not understand how to translate index.md to Finnish. I have tried placing index.fi.md to the same src/pages dir and i18n/fi/index.md etc, but no avail. Any tips appreciated.

Where is it documented to do so? We don't support that (yet). Please read the official i18n docs to understand how to translate content.

The tip "put the translation files at the correct filesystem location" is not exactly clear in the docs. Been bashing my head to the wall for hour now.

If you don't succeed please create a repro showing what you tried exactly, and I'll let you know your mistake. Without a repro it's impossible to know what you missed.

ronilaukkarinen commented 1 year ago

Sorry, I already gave up and moved on to GitBook. If it's not possible to translate index/main pages then Docusaurus is not for me.

slorber commented 1 year ago

If it's not possible to translate index/main pages then Docusaurus is not for me.

It is possible @ronilaukkarinen.

But we never documented to do docs/index.fi.md so why do you try that?

And we never documented to do i18n/fi/index.md so why do you try that?

Yes, to make it work you have to read the docs carefully and not try random things. It may not be the most intuitive but if you follow the instructions you should be able to make it work.

The example file-system location on the link you provided seems relatively clear enough to me:

CleanShot 2023-07-20 at 11 56 20@2x

The path you were probably looking for is: website/i18n/fi/docusaurus-plugin-content-docs/current/index.md or website/i18n/fi/docusaurus-plugin-content-pages/index.md

I understand how this path may feel overly complicated, but there's a reason for that:

That's also why I would like to support index.fi.md in the future: because it would be simpler/more intuitive for those that are ok to put translations on Git (the i18n path remains useful for those using external SaaS translation tools like Crowdin IMHO)

Good luck with GitBook