Closed andrewdavidwong closed 3 years ago
@tokideveloper, @maiska: Would this interfere with your website localization work?
Ideally, we could just de-slugify the filename to form the title automatically, for example.
This may not be possible, since slugifying is, by its nature, a lossy conversion. If not feasible, perhaps we could have a script that automatically renames every file with its slugified page title. (In cases in which the name is already correct, the rename would be a NOP.)
I'm still going to try it, though, since we might get sufficiently good results from simply replacing hyphens with spaces and capitalizing the first letter, for example.
Update: Tested it out locally. I think we need more control over capitalization than this would allow. For example, "XSA" in a title needs to be in all caps, which this would not support. Scrapping this specific idea, but it's still better to reduce the duplication from three things down to two. Filenames are ineliminable, and having the title in the YAML frontmatter is useful, so let's get rid of h1 headings in the body text.
@tokideveloper, @maiska: Would this interfere with your website localization work?
Here are my IMHO thoughts:
Redundancy
Permalink
redirect_from
..md
file ending? If yes then it's not so nice.Breadcrumb link trail
page.relative_path
Jekyll variable and split it up to generate breadcrumbs. (*)Title
YAML front matter in general
Doc index
Having every link and title for the doc index in a YAML file is redundant. We should just generate this from the directory structure and filenames.
Summary
Ideas
*.md
files to their own directories? Like moving /doc/user/common-tasks/copy-paste.md
to /doc/user/common-tasks/copy-paste/index.md
such that the directory path equals the URL path? Maybe this could help making the permalink redundant. (For translation purposes, index.md
is not a good name since all files uploaded to Tranisfex would look the same.)I agree with @tokideveloper regarding permalinks - while they may look like a duplication, they are useful. Similarly for the title - slugifying is lossy. Keeping them in sync in the other way (automatically rename based on the title) may make reading git history a bit harder, especially with the web interface (not a huge issue, but still an issue). But having both title:
in YAML and h1 should not be needed - the latter should be generated based on the former.
I think the biggest issue is the doc index - it duplicates page titles (in a separate file, so very easy to desynchronize), needs to be updated each time some page is added/moved/removed and generally sounds like something that should be generated, not hand-crafted.
Permalink
- Currently, our localization scripts heavily rely on explicit permalinks. Making them implicit means that we have to re-calc them which is not so nice and error-prone.
It shouldn't be error-prone, since the re-calc is deterministic and allows no room for human error.
- An explicit permalink is good for translators since they can find the page they are translating relatively easy by entering the permalink in their browser's address bar. This is in a sense necessary since the directory path is lost (in a sense) when uploading the related MD file to Transifex.
Oof. That sounds like a shortcoming of the translation workflow. I don't think it's a good idea for translators to rely on something like a permalink in a YAML header. Most other projects don't have that. How do they keep things organized?
- A file with an explicit permalink can be sent via email without loosing the context. Note that the permalink (and also the directory path) is in a sense a context hint.
Except the permalink almost never provides any context. Most of our permalinks are just /doc/<title>
.
- Note that it's a little bit harder to move a page to another URL with implicit permalinks. The old implicit permalink must be generated (or manually copied from the address bar) and put to
redirect_from
.
True.
- Does the implicit permalink contain the
.md
file ending? If yes then it's not so nice.
No.
Breadcrumb link trail
- Breadcrumbs are possible to implement even with permalink being explicitly specified. Independently from the permalink, we could use the
page.relative_path
Jekyll variable and split it up to generate breadcrumbs. (*)
But then it's yet another thing to keep manually in sync, or else it will make no sense. It makes no sense for the breadcrumb trail not to match the URL path, and it's definitely not worth it if I have to keep both in sync manually.
page.relative_path
Jekyll variable
No such variable found. Maybe you meant page.path
? But then I don't understand the suggestion.
YAML front matter in general
- How many times has someone to touch the YAML front matter of an MD file during its life cycle so that it could get out of sync? Has this happened in the past?
It happens all the time, and there is a lot of desync in qubes-doc right now. It's not just about touching the YAML frontmatter. That's just one part that can get desynced from other parts.
Doc index
Having every link and title for the doc index in a YAML file is redundant. We should just generate this from the directory structure and filenames.
- This should be feasible. See (*) above.
See above. I don't understand this suggestion.
Ideas
- Concerning the permalink: Would it be feasible to move all
*.md
files to their own directories? Like moving/doc/user/common-tasks/copy-paste.md
to/doc/user/common-tasks/copy-paste/index.md
such that the directory path equals the URL path? Maybe this could help making the permalink redundant. (For translation purposes,index.md
is not a good name since all files uploaded to Tranisfex would look the same.)
Aren't you contradicting your own earlier points? I don't see how this would help.
I agree with @tokideveloper regarding permalinks - while they may look like a duplication, they are useful.
I don't deny that they are useful. I'm just pointing out that they make maintenance and contributing burdensome.
How do you propose we keep permalinks in sync with the directory structure?
Or, if we can't/won't do that, then what if we just have two directories in qubes-doc
-- user
and developer
-- and dump all *md
files into those two directories without any further subdirectories?
(I think we can simply delete external
after a while. There is no need to maintain those redirects forever. They were only intended to be temporary anyway.)
I think the biggest issue is the doc index - it duplicates page titles (in a separate file, so very easy to desynchronize), needs to be updated each time some page is added/moved/removed and generally sounds like something that should be generated, not hand-crafted.
I've spent many, many hours trying to figure out a way to do this, and I haven't been able to come up with anything that works well enough. The main problem is that expressions like {% if page.path contains {{ section.dir }} %}
simply don't work. They generate nothing, presumably because contains
requires a string and won't accept a variable there. So we'd have to hard-code each section/directory (and then we could at least auto-gen the list of pages under it), but that's still too much hardcoding, unless we radically flatten the doc structure (as mentioned above).
It also doesn't help that I don't understand most of the new localization code.
Another idea would be to have only links in the YAML file.
Good idea. It's only halfway but would still be an improvement. I'll see if I can get it to work.
Permalink
- Currently, our localization scripts heavily rely on explicit permalinks. Making them implicit means that we have to re-calc them which is not so nice and error-prone.
It shouldn't be error-prone, since the re-calc is deterministic and allows no room for human error.
It's deterministic but I thought of the function "slugify". Slugifying the title would mean that we had to know how exactly the "slugify" function works. It's not clear to me yet. But if we don't need to slugify anything then it should be feasible and not error-prone.
- An explicit permalink is good for translators since they can find the page they are translating relatively easy by entering the permalink in their browser's address bar. This is in a sense necessary since the directory path is lost (in a sense) when uploading the related MD file to Transifex.
Oof. That sounds like a shortcoming of the translation workflow. I don't think it's a good idea for translators to rely on something like a permalink in a YAML header. Most other projects don't have that. How do they keep things organized?
Okay, the Transifex config file exists where the mapping is listed. But here, I thought of an easy way for translators to get the website they are translating. It's not mandatory to have such a nice feature (and there may be other ways) but I like it.
- A file with an explicit permalink can be sent via email without loosing the context. Note that the permalink (and also the directory path) is in a sense a context hint.
Except the permalink almost never provides any context. Most of our permalinks are just
/doc/<title>
.
I agree.
- Note that it's a little bit harder to move a page to another URL with implicit permalinks. The old implicit permalink must be generated (or manually copied from the address bar) and put to
redirect_from
.True.
- Does the implicit permalink contain the
.md
file ending? If yes then it's not so nice.No.
Breadcrumb link trail
- Breadcrumbs are possible to implement even with permalink being explicitly specified. Independently from the permalink, we could use the
page.relative_path
Jekyll variable and split it up to generate breadcrumbs. (*)But then it's yet another thing to keep manually in sync, or else it will make no sense. It makes no sense for the breadcrumb trail not to match the URL path, and it's definitely not worth it if I have to keep both in sync manually.
Oh, sorry, I thought of the case when the permalink matches the file path but I didn't write it. Sorry!
Actually, the suggestion is trivial: If the permalinks match the file paths then the variables page.url
and page.path
are almost the same and thus, can be used interchangeably. This is what I wanted to say.
page.relative_path
Jekyll variableNo such variable found. Maybe you meant
page.path
? But then I don't understand the suggestion.
I took it from here.
YAML front matter in general
- How many times has someone to touch the YAML front matter of an MD file during its life cycle so that it could get out of sync? Has this happened in the past?
It happens all the time, and there is a lot of desync in qubes-doc right now. It's not just about touching the YAML frontmatter. That's just one part that can get desynced from other parts.
Okay. This should be solved.
Doc index
Having every link and title for the doc index in a YAML file is redundant. We should just generate this from the directory structure and filenames.
- This should be feasible. See (*) above.
See above. I don't understand this suggestion.
See my sorry above.
Ideas
- Concerning the permalink: Would it be feasible to move all
*.md
files to their own directories? Like moving/doc/user/common-tasks/copy-paste.md
to/doc/user/common-tasks/copy-paste/index.md
such that the directory path equals the URL path? Maybe this could help making the permalink redundant. (For translation purposes,index.md
is not a good name since all files uploaded to Tranisfex would look the same.)Aren't you contradicting your own earlier points? I don't see how this would help.
In a sense, I'm contradicting my earlier points, yes. Actually, I tried to make a step towards you and find a solution.
Describe the solution you'd like
Some specific examples:
- Using
permalink:
in the YAML frontmatter is worse than just letting Jekyll generate the permalink based on the directory path and filename. Our Jekyll config is already set to do this. We just need to delete all thepermalink:
lines and set up redirects from the existing URLs. This would also allow us to auto-generate a breadcrumb link trail at the top of each doc page, which would make navigation easier.
Instead of letting Jekyll generate the implicit permalinks, we could write an extern (Python?) script that does the work and produces explicit permalinks based on the directory path and filename (and puts obsolete permalinks into redirect_from
). This way, we could combine the best of both worlds. The script should then be run by Travis.
It's deterministic but I thought of the function "slugify". Slugifying the title would mean that we had to know how exactly the "slugify" function works. It's not clear to me yet. But if we don't need to slugify anything then it should be feasible and not error-prone.
Oh, slugifying is a pretty common and straightforward thing. Have a look here: https://jekyllrb.com/docs/liquid/filters/
But then it's yet another thing to keep manually in sync, or else it will make no sense. It makes no sense for the breadcrumb trail not to match the URL path, and it's definitely not worth it if I have to keep both in sync manually.
Oh, sorry, I thought of the case when the permalink matches the file path but I didn't write it. Sorry!
Actually, the suggestion is trivial: If the permalinks match the file paths then the variables
page.url
andpage.path
are almost the same and thus, can be used interchangeably. This is what I wanted to say.
The problem is that they currently almost never match, but it would be more organized and consistent if they did.
page.relative_path
Jekyll variableNo such variable found. Maybe you meant
page.path
? But then I don't understand the suggestion.I took it from here.
FWIW, page.path
and page.relative_path
return the same result for our doc pages (just tested).
In any case, this only works well when the URL and directory path are the same.
Ideas
- Concerning the permalink: Would it be feasible to move all
*.md
files to their own directories? Like moving/doc/user/common-tasks/copy-paste.md
to/doc/user/common-tasks/copy-paste/index.md
such that the directory path equals the URL path? Maybe this could help making the permalink redundant. (For translation purposes,index.md
is not a good name since all files uploaded to Tranisfex would look the same.)Aren't you contradicting your own earlier points? I don't see how this would help.
In a sense, I'm contradicting my earlier points, yes. Actually, I tried to make a step towards you and find a solution.
Thank you, but I don't think this would be necessary, because auto-generating the URL from the path works without doing this.
Instead of letting Jekyll generate the implicit permalinks, we could write an extern (Python?) script that does the work and produces explicit permalinks based on the directory path and filename (and puts obsolete permalinks into redirect_from). This way, we could combine the best of both worlds. The script should then be run by Travis.
Yes, I think something like this would be very helpful.
I think the biggest issue is the doc index - it duplicates page titles (in a separate file, so very easy to desynchronize), needs to be updated each time some page is added/moved/removed and generally sounds like something that should be generated, not hand-crafted.
I've spent many, many hours trying to figure out a way to do this, and I haven't been able to come up with anything that works well enough. The main problem is that expressions like
{% if page.path contains {{ section.dir }} %}
simply don't work. They generate nothing, presumably becausecontains
requires a string and won't accept a variable there. So we'd have to hard-code each section/directory (and then we could at least auto-gen the list of pages under it), but that's still too much hardcoding, unless we radically flatten the doc structure (as mentioned above).
Sadly, the Liquid Template Language is very limited. Maybe, the doc index could be produced via an extern script (run by Travis)?
Sadly, the Liquid Template Language is very limited. Maybe, the doc index could be produced via an extern script (run by Travis)?
I was able to make some decent progress using your suggestion (remove the titles; use only the URLs). It's partial automation at least.
Instead of letting Jekyll generate the implicit permalinks, we could write an extern (Python?) script that does the work and produces explicit permalinks based on the directory path and filename (and puts obsolete permalinks into redirect_from). This way, we could combine the best of both worlds. The script should then be run by Travis.
Yes, I think something like this would be very helpful.
Then let's write one! But first, let's ask @marmarek if it's fine.
Sadly, the Liquid Template Language is very limited. Maybe, the doc index could be produced via an extern script (run by Travis)?
I was able to make some decent progress using your suggestion (remove the titles; use only the URLs). It's partial automation at least.
If you are pleased with the result then we could use it. If not then: Let's write an external script! ;-)
I was able to make some decent progress using your suggestion (remove the titles; use only the URLs). It's partial automation at least.
If you are pleased with the result then we could use it. If not then: Let's write an external script! ;-)
They would be complementary, not exclusive. ;)
I was able to make some decent progress using your suggestion (remove the titles; use only the URLs). It's partial automation at least.
If you are pleased with the result then we could use it. If not then: Let's write an external script! ;-)
They would be complementary, not exclusive. ;)
Ah, okay. I had an external script in mind which produces a plain Markdown file showing the index without magic in it. I didn't think of the YAML file containing only the URLs. Sorry, I should be more expressive.
I was able to make some decent progress using your suggestion (remove the titles; use only the URLs). It's partial automation at least.
If you are pleased with the result then we could use it. If not then: Let's write an external script! ;-)
They would be complementary, not exclusive. ;)
Ah, okay. I had an external script in mind which produces a plain Markdown file showing the index without magic in it.
Not sure what you mean. What magic?
(BTW, the current index is a YAML file, not a Markdown file.)
I didn't think of the YAML file containing only the URLs.
The idea is that the index contains the URLs of doc pages but not the titles. Title of each page is grabbed from the title:
in the YAML frontmatter of each doc file.
However, this means we have to manually edit the permalink of every single doc file. If we ever wanted to change the permalinks to match the directory structure, for example, this means we would have to edit every single one. This is where your python script would come in handy!
However, now that I think about it, I'm not sure sure if we really want to change all the permalinks to match the directory structure, since we often do not have a page for each intermediate step. For example, consider this hypothetical URL:
/doc/user/troubleshooting/disk-troubleshooting/
A visitor might expect something at each of these URLs:
/doc/user/
/doc/user/troubleshooting/
But there's nothing at either of those, because they're simply sections on the /doc/
page, and we probably don't want to bother to make them. So maybe the breadcrumb and change-permalinks-to-match-directory-structure ideas aren't worth it.
Note that it's a little bit harder to move a page to another URL with implicit permalinks. The old implicit permalink must be generated (or manually copied from the address bar) and put to
redirect_from
.
That actually I worry about quite a lot. If renaming file or changing its title (depending on how title will be related to the file name) will invalidate its original URL, it will be very easy to break links (our CI will detect internal issues, but we can't possibly find all the links from outside of our website).
The idea of having explicit permalink, that is kept in sync with directory structure with a script (and that script also cares about adding relevant redirect_from
) may work indeed. One issue with that is Travis^WGitlab-CI can't commit things. But I think we can make it post a review with required changes as "suggestions". This may be not entirely trivial, for example it needs to avoid posting the same suggestion over and over... Alternative is just to complain when things are broken, suggesting a change in just job log.
As a general direction of permalinks matching directory structure, I'm not really sure. On one hand, they will ease finding actual source page by just looking at the URL (something that I personally find hard with the current state), and also will make it clearer which documentation section is it (most useful for "user" / "developer" distinction, IMO less about more detailed categories). On the other hand, they will be longer, and as Andrew just pointed out, not always intermediate parts makes sense. Plus, attempting to keep them in sync with the directory structure (with whatever method we choose) will be some effort, even if just one time writing a script.
As for the index, per-section generated index, like you did in https://github.com/QubesOS/qubes-doc/commit/68f6f96220038cd6330f4fdb7608b78ce8f2cb51 would be a massive improvement already. The only remaining manual work would be adjusting sections themselves (reordering them, changing their titles etc). A lot less frequent work. I would call it good enough. Anyway, you can try {% if page.path contains section.dir %}
(without inner {{ ... }}
).
Anyway, you can try {% if page.path contains section.dir %} (without inner {{ ... }}).
I also tried that. Didn't work, presumably because it only accepts a string.
But it's fine. I have a good-enough solution for the time being.
I was able to make some decent progress using your suggestion (remove the titles; use only the URLs). It's partial automation at least.
If you are pleased with the result then we could use it. If not then: Let's write an external script! ;-)
They would be complementary, not exclusive. ;)
Ah, okay. I had an external script in mind which produces a plain Markdown file showing the index without magic in it.
Not sure what you mean. What magic?
Liquid code.
Note that it's a little bit harder to move a page to another URL with implicit permalinks. The old implicit permalink must be generated (or manually copied from the address bar) and put to
redirect_from
.That actually I worry about quite a lot. If renaming file or changing its title (depending on how title will be related to the file name) will invalidate its original URL, it will be very easy to break links (our CI will detect internal issues, but we can't possibly find all the links from outside of our website).
Now, I also think of our translated *.md
files. Their permalinks look like /de/doc/...
and /pl/doc/...
. But the files don't reside at /de/doc/...
or /pl/doc/...
in the doc repo. So, permalinks can't reflect the directory structure of our translated files which would lead to inconsistency with the general idea of permalinks matching directory structure.
The idea of having explicit permalink, that is kept in sync with directory structure with a script (and that script also cares about adding relevant
redirect_from
) may work indeed. One issue with that is Travis^WGitlab-CI can't commit things. But I think we can make it post a review with required changes as "suggestions". This may be not entirely trivial, for example it needs to avoid posting the same suggestion over and over... Alternative is just to complain when things are broken, suggesting a change in just job log.
Sounds complicated and like another downside.
As for the index, per-section generated index, like you did in QubesOS/qubes-doc@68f6f96 would be a massive improvement already. The only remaining manual work would be adjusting sections themselves (reordering them, changing their titles etc). A lot less frequent work. I would call it good enough.
This is something I still didn't get yet. If we have Liquid code that generates the index items then the result exists only at Jekyll runtime and thus we can't change the ordering manually afterwards. Or am I wrong? Can someone explain it, please?
Liquid code.
Not allowed to use liquid inside of YAML anyway.
This is something I still didn't get yet. If we have Liquid code that generates the index items then the result exists only at Jekyll runtime and thus we can't change the ordering manually afterwards. Or am I wrong? Can someone explain it, please?
Correct. This is a downside of a fully-auto-generated doc index: no control over the order of the links. My current approach (inspired by you) solves this by only partially automating it (hand-crafted list of links, in order, with the rest automated).
Liquid code.
Not allowed to use liquid inside of YAML anyway.
No, no, no, you got we wrong. I thought of an external script that produces the doc index page in plain Markdown without Liquid code. Currently, the doc index page is generated via the layout doc-index
which contains Liquid code. I, instead, thought of moving that Liquid code to an external (Python?) script because of the contains
issue you mentioned. But since you found a solution to that issue, it's no longer necessary.
This is something I still didn't get yet. If we have Liquid code that generates the index items then the result exists only at Jekyll runtime and thus we can't change the ordering manually afterwards. Or am I wrong? Can someone explain it, please?
Correct. This is a downside of a fully-auto-generated doc index: no control over the order of the links. My current approach (inspired by you) solves this by only partially automating it (hand-crafted list of links, in order, with the rest automated).
Thank you! :+1:
I thought of a better way to create breadcrumb navigation trails: https://github.com/QubesOS/qubesos.github.io/commit/07dbc013e425ff2fafd26a1a51fbcd0a45f6b8f2.
Having title: in the YAML frontmatter and having an h1 heading in the body and having the title in the filename is triply redundant and often gets out of sync. Better to use just one. Ideally, we could just de-slugify the filename to form the title automatically, for example.
It's not redundant. I totally respect the need for easy maintenance. Styleguides exist, to guide how people should name things. There is no writing styleguide to guide the docs, that I know of. One should really exist, to contain the chaos you're trying to avoid.
Having title: in the YAML frontmatter and having an h1 heading in the body and having the title in the filename is triply redundant and often gets out of sync. Better to use just one. Ideally, we could just de-slugify the filename to form the title automatically, for example.
It's not redundant. I totally respect the need for easy maintenance. Styleguides exist, to guide how people should name things. There is no writing styleguide to guide the docs, that I know of. One should really exist, to contain the chaos you're trying to avoid.
Of course there is. It's right here:
https://www.qubes-os.org/doc/doc-guidelines/
Unfortunately, many people don't read it, which is why it's not sufficient. For better or worse, we don't have the time or workforce to write all of the documentation ourselves, so it's a community volunteer effort. Hence, our general policy regarding doc PRs is: "If accepting a PR would have a net positive effect, then accept it, even if it doesn't follow all the rules or is flawed." If we required PRs to follow all the rules before accepting them, many contributors would not be able or willing to ever fix them correctly, and we'd just lose out on those contributions, meaning we'd forgo a net benefit each time. This is just another example of not allowing the perfect to be the enemy of the good. Since there are countless little things that can go wrong with a doc PR that aren't enough to merit rejection, the cumulative benefit of automating maintenance is enormous.
There is no writing styleguide to guide the docs, that I know of.
Of course there is. It's right here:
Reorganized doc guidelines to address this (https://github.com/QubesOS/qubes-doc/commit/c29cf40910ff68a2c6c9585ca39d73584e48b31b, https://github.com/QubesOS/qubesos.github.io/commit/a540575f0e1df720254ab1951530864a77676b29).
Old URL above now redirects to this new URL:
The problem you're addressing (if any)
Doc maintenance is a painful burden, and it's only getting worse. There are many places where things have to kept in sync manually, and there's no good reason for this when it could be done automatically.
Describe the solution you'd like
Some specific examples:
permalink:
in the YAML frontmatter is worse than just letting Jekyll generate the permalink based on the directory path and filename. Our Jekyll config is already set to do this. We just need to delete all thepermalink:
lines and set up redirects from the existing URLs. This would also allow us to auto-generate a breadcrumb link trail at the top of each doc page, which would make navigation easier.title:
in the YAML frontmatter and having an h1 heading in the body and having the title in the filename is triply redundant and often gets out of sync. Better to use just one. Ideally, we could just de-slugify the filename to form the title automatically, for example.Where is the value to a user, and who might that user be?
This is mainly of value to people who work on the docs, including contributors. It would greatly simplify things to handle this stuff automatically. Readers would also benefit from a more robust doc system with fewer errors (due to less room for human error to introduce problems).
Describe alternatives you've considered
There are existing doc management systems that have already solved these problem, but we're probably beyond that point.
Relevant documentation you've consulted
N/A
Related, non-duplicate issues
https://github.com/QubesOS/qubes-issues/issues/5308