Open andrewdavidwong opened 7 years ago
From #3547:
@marmarek is going to make (a) separate submodule(s) for the actual translated content (#2925).
Please let us know the name of that submodule when it's created.
A more general question: Currently, I see multiple repos like qubesos.github.io
, qubes-attachment
, qubes-posts
, qubes-hcl
, qubes-doc
, qubes-manager
etc. My questions here are:
@marmarek @andrewdavidwong I didn't design everything in detail yet but I think that having two repos would be fine:
_site
directory without any problems and provided to the publicThe main reason is that I don't want to break the live repo in case of script errors etc.
Concerning the naming of the new repos: Due to the Wikipedia article about Internationalization and localization, it seems to be okay to name
qubesos-website-translation
or qubes-website-i18n-and-l10n
with an optional suffix -working-repo
qubesos-website-translated
or qubesos-website-localized
.Any suggestions?
IMO translated repo should contain translated md files, not generated HTML. Not sure how this fits into above repo layout, but I guess it's the first one. There is no need to manually call yekyll - github pages does it for us.
As for scripts - I think we can keep them in qubesos.github.io repo - there is already _utils
directory. Such script should download new content from transifex, prepend and/or validate frontmatter (layout, urls, redirect etc - to make sure it wont hijack content from other language, especially English), then commit to repository with translated content.
The majority of content to translate is in qubes-doc repo. There is some in qubesos.github.io and qubes-posts, but not sure if/how we want to handle it. Definitely qubesos.github.io needs to have some translation related files - layouts, language switcher etc. But IMO it can be handled manually (regardless of the script mentioned above). So, I think we need qubes-doc-translated, or maybe even separate one for each language. Any opinion?
IMO translated repo should contain translated md files, not generated HTML. Not sure how this fits into above repo layout, but I guess it's the first one.
To give an overview of how I want to use the "working repo":
I plan to automate translation as much as possible to both aid the translator (e.g. translating links, translating the YAML front matter) and ensure integrity (e.g. see the problem of translating internal links). My current process of translating an MD file, let's call it example.md
, looks roughly like this (with fictional file names and locations):
example.md
, resulting in example_ready-for-transifex.md
and example_yaml-front-matter.yml
.example_ready-for-transifex.md
to Transifex.de-DE/example_from-transifex.md
.de-DE/example_from-transifex.md
to ensure integrity.example_yaml-front-matter.yml
using de-DE/example_from-transifex.md
, resulting in de-DE/example_yaml-front-matter.yml
de-DE/example_from-transifex.md
together with de-DE/example_yaml-front-matter.yml
, resulting in de-DE/example_ready-for-jekyll.md
.de-DE/example_ready-for-jekyll.md
be processed by Jekyll, resulting in de-DE/example_from-jekyll.html
.de-DE/example_from-jekyll.html
, resulting in de-DE/example_ready-for-going-live.html
de-DE/example_ready-for-going-live.html
may go live now.So, after being processed by Jekyll, the file de-DE/example_from-jekyll.html
is not yet ready to go live. E.g. see the problem of translating fragments.
I just tested if the generated files of Jekyll can be modified manually without being modified by Jekyll again: It's possible, so, yes, in theory, only one repo seems to be sufficient. But there might be a race condition if Jekyll re-generates a file while step 8 is in process.
Also, I recently read on the Jekyll homepage this warning which encouraged me to use a second repo for the ready-for-going-live files:
Destination folders are cleaned on site builds [...] Do not use an important location for \<destination>; instead, use it as a staging area and copy files from there to your web server.
In addition, having only live-going files in a second repo will keep it clean from noise.
There is no need to manually call yekyll - github pages does it for us.
That's okay if the problem of the race condition mentioned above can be solved.
As for scripts - I think we can keep them in qubesos.github.io repo - there is already _utils directory. Such script should download new content from transifex, prepend and/or validate frontmatter (layout, urls, redirect etc - to make sure it wont hijack content from other language, especially English), then commit to repository with translated content.
IMHO, since these new scripts are related to translation, they might be better stored in a translation repo. (Okay, there is the problem with the language switcher which could be solved in the canonical and copied to a translation repo or vice versa.)
The hijacking problem shall be solved by step 1 and step 5 in my process described above.
The majority of content to translate is in qubes-doc repo. There is some in qubesos.github.io
For these, my solution is almost complete.
and qubes-posts, but not sure if/how we want to handle it.
For qubes-posts
, I don't have a solution yet.
Definitely qubesos.github.io needs to have some translation related files - layouts, language switcher etc. But IMO it can be handled manually (regardless of the script mentioned above).
Almost everything can be automated - the language switcher, too.
So, I think we need qubes-doc-translated, or maybe even separate one for each language. Any opinion?
I thought about one repo for each language and I think it's cumbersome, inconvenient and hard to maintain. But it might make sense if there are trusted people who sign the commits of the translations.
I also think that one translation repo per canonical repo is inconvenient in a similar way. But, to be honest, I don't know the reason for having the already existing canonical sub-repos.
I think that one repo for both all languages and all files will fit our needs. Note that all content produced during translation is not trusted - at least not yet - so this characteristic "keeps the translates files together". But maybe there are good reasons concerning the use of Git or the separation of concerns.
I also think that one translation repo per canonical repo is inconvenient in a similar way. But, to be honest, I don't know the reason for having the already existing canonical sub-repos.
The idea is to cleanly separate "website stuff" from "documentation stuff", mostly for #1019.
I think that one repo for both all languages and all files will fit our needs. Note that all content produced during translation is not trusted - at least not yet - so this characteristic "keeps the translates files together". But maybe there are good reasons concerning the use of Git or the separation of concerns.
This is why I propose keeping scripts in separate repo (not necessary new one - that's why I propose qubesos.github.io). Because those scripts need to be trusted - they implement "sandboxing" translated content. One repo for all languages is fine for me.
So, after being processed by Jekyll, the file de-DE/example_from-jekyll.html is not yet ready to go live. E.g. see the problem of translating fragments.
Can those post-processing changes be applied back to de-DE/example_ready-for-jekyll.md? If that's only about various links, it should be easy (maybe even easier to parse and process markdown than html?)
I also think that one translation repo per canonical repo is inconvenient in a similar way. But, to be honest, I don't know the reason for having the already existing canonical sub-repos.
The idea is to cleanly separate "website stuff" from "documentation stuff", mostly for #1019.
Thank you.
I think that one repo for both all languages and all files will fit our needs. Note that all content produced during translation is not trusted - at least not yet - so this characteristic "keeps the translates files together". But maybe there are good reasons concerning the use of Git or the separation of concerns.
This is why I propose keeping scripts in separate repo (not necessary new one - that's why I propose qubesos.github.io). Because those scripts need to be trusted - they implement "sandboxing" translated content.
I see that these scripts (and also new ones) need to be trusted and creating a new repo might be too much, so, placing new ones into qubesos.github.io
is fine for me.
But I'm not sure what the existing scripts do and what you mean with 'they implement "sandboxing" translated content'.
So, after being processed by Jekyll, the file de-DE/example_from-jekyll.html is not yet ready to go live. E.g. see the problem of translating fragments.
Can those post-processing changes be applied back to de-DE/example_ready-for-jekyll.md? If that's only about various links, it should be easy (maybe even easier to parse and process markdown than html?)
Applying back sounds like closing a loop, resulting in infinite processing. Not so good. Even if we could get it work this way: It would be a hack rather than a clean solution IMHO since the mentioned problem with fragments cannot be solved on an abstract "MD level" but on the "HTML level". Think of this: Before assigning an ID for an HTML element, it should be checked if that ID is already in use - which cannot be done before generating the HTML file by an Markdown processor.
Another reason against: How can I (or a machine) decide whether de-DE/example_ready-for-jekyll.md
comes from step X or step Y? If an additional flag is needed then we could also use another directory instead.
Last but not least, HTML headings - at least generated by the current version of kramdown
- have a pattern that seems to be easily detectable by a regex:
grep -re '^ *<h[0-9]\+ id="[^"]\+">.*</h[0-9]\+>$' _site/
where .*
belongs to the heading in the MD file. Okay, it's a little bit more complicated (think of hard-coded HTML headings in MD files) but I've already got an idea to solve even that.
Is it possible to let GitHub run scripts let's say daily? Or do I have to run those scripts daily and manually on my machine?
I see that these scripts (and also new ones) need to be trusted and creating a new repo might be too much, so, placing new ones into qubesos.github.io is fine for me. But I'm not sure what the existing scripts do and what you mean with 'they implement "sandboxing" translated content'.
What I meant by "sandboxing" is the script (among other things) will ensure that translated pages will live in /lang/ subtree and will not interfere with other languages (especially base English one). Mostly the process you describe in #3547.
Do you plan to keep de-DE/example_ready-for-jekyll.md committed in some repository? IMO it worth keeping something downloaded from transifex to be able to adjust and re-apply postprocessing, without downloading files from transifex again.
What I meant by "sandboxing" is the script (among other things) will ensure that translated pages will live in /lang/ subtree and will not interfere with other languages (especially base English one). Mostly the process you describe in #3547.
Okay, I see. Translated pages have to reside in a separate name space (i.e. URL path space).
Do you plan to keep de-DE/example_ready-for-jekyll.md committed in some repository? IMO it worth keeping something downloaded from transifex to be able to adjust and re-apply postprocessing, without downloading files from transifex again.
Yes. To be more precisely, even though not elaborated fully, I plan to construct a "working pipeline" by creating a directory for each stage described above. The MD file of a new/updated page has to pass all stages. We say that a file is in a certain stage if it's in the corresponding stage directory and if there is no related file in the next stage directory.
There will be a script (plus helper scripts if needed) for each stage transition. The files shall remain in each directory for tracking purposes. In the last stage, there are the final files, ready for going live.
Beside the files to process, there will also be a log for tracking smaller working steps and a separate one for notifications in case of errors or ambiguities.
Aside from the "working pipeline", there shall be a "tracking pipeline" with the equal internal structure. The difference is that the "working pipeline" is for new pages and updated ones which aren't ready for going live while the "tracking pipeline" is for saving the transition history of the current live pages.
I'm not sure if this question has been overseen, so I ask again: Is it possible to let GitHub (or anything else) run the scripts for translation stuff e.g. daily? Or do I have to run these scripts daily on my private machine?
Github can't do it, but we probably can use travis (there is an option for scheduled runs), or other similar service. We also have a little of own infrastructure, but this is the least preferable option.
Anyway, when we'll have the scripts, we'll find a way to run it periodically.
[Branched from #2652.]
Create separate untrusted submodule for translated files (i.e., no signed commits or tags).