Open matthiasrohmer opened 4 years ago
The following are common errors I've stumbled over when merging the localization PRs so far:
Non-content frontmatter keys do not need to be localized
amp.dev has various configuration keys in its documents, like $order
, formats
, toc
and others. See Formatting Guides & Tutorials for a more detailed explanation.
Only the keys that hold content ($title
, $titles
, teaser
) and their children need to be localized. Keys like formats
or author
for example can be removed from the translated document as they will fallback to the original document.
Translated texts may not contain YAML syntax
Teaser texts often have a colon (:
) in them. If that's the case the whole value needs to be enclosed in "
or '
as it will break the YAML syntax otherwise. The same goes for other YAML special chars like *
.
Special tags need to be left intact
Please make sure that [tip]...[/tip]
, [example]...[/example]
and other custom tag pairs are left intact and are not transformed. This example shows a purposefully "broken" tag as it illustrates the syntax for documentation contributors. If [
is getting replaced by [
the build sadly breaks.
Will add more examples as soon as they occur. If there are questions, feel free to ping me here. Sorry, I wish the project would be more forgiving, but everyone of this issues actually broke the build and are tedious to track down (and even to spot in the PR review) so let's work together to prevent them in the first place.
@matthiasrohmer Thanks for the insights Matthias. We have already implemented a check (manual for now) for the frontmatter yaml syntax, and I’ve asked the team to be extra careful with frontmatter keys. As to things like ‘ [ getting replaced by [‘ — is there any way CI checks can be added for things like that? I’m just worried that if the community gets involved in supporting the translations at a later stage, we may have issues like that.
You're welcome! There are definitely some points that can be improved... Also discussed this with @caroqliu yesterday. There are two tasks that I'll take on for next week: make the tag parsing more forgiving (allow arbitrary whitespace for example and make [tip type = 'note']
or [/ tip]
working) and also introduce a check that only builds the changed subset of pages) so that errors already surface during PR review without the check to check manually.
The following is a quick summary of issues that occurred in the previous PRs, fixed with https://github.com/ampproject/amp.dev/pull/4804:
Make sure to correctly format headlines
Even if it makes sense, headlines ending in a :
may not be merged with following paragraphs as this breaks both formatting and the Table of Contents as can be seen in this example
Do not add more whitespace to special tags
Currently only [tag key="value"]
without any additional white-space is valid. I'll work to improve this but so long make sure tags don't get malformed to something like `key = "value"``
jinja2 special characters need to be escaped That's something we can not fix as it's jinja syntax. For example in Italian documents:
{{ image('/static/img/docs/guides/cse/cse6.jpg', 264, 261, align='', layout='intrinsic', alt='Entrambe la chiavi sono memorizzate all\'interno del documento AMP.') }}
The '
after all
needs to be escaped as it's enclosed in a string which would otherwise come to an early end.
Do not translate custom tags
Also found in an Italian document: [filtri formats="siti web, annunci, storie“]
- tags like those, including their values, may not be translated.
CC: @svasilenkov
@matthiasrohmer Just a heads-up that we are working on all these, but I wanted to comment on the headline formatting issue. This is basically caused by the fact that in some source files (like the one in example above) headers are not formatted per Markdown specification. Please take a look at how GitHub renders this: https://github.com/ampproject/amp.dev/blob/future/pages/content/amp-dev/documentation/guides-and-tutorials/develop/advertise_amp_stories.md In this case there is no space between ## and the headline, and there is no double-linebreak between the headline and the paragraph. The latter is not necessary, but either of the two would make things better, as our markdown parser is basically treating that as a single paragraph (just the same as Github does). We we are about to finish working on a pre-processor to address this though. CC: @patrickkettner
To support the localization efforts and make them come to amp.dev as smooth as possible I thought it's best to create a dedicated tracking issue to not get lost in the various PRs and/or too specific issues. My feeling also was that GitHub is a simple and async channel for quick questions.
/cc @patrickkettner @ilyaspiridonov