Closed roberth closed 1 year ago
It would be nice but there was opposition to including pandoc in the NixOS system closure. So we would either need to decouple the manual from the system, choose some lighter parser (maybe lowdown + HTML→DocBook via XSL), or cache pre-generated DocBook like we currently do with the NixOS manual.
opposition [to increase in build closure size because of pandoc]
Sure! If we don't want pandoc in the build closure for NixOS, we can choose to implement step 5. NixOS' dependency on pandoc will be a temporary affair.
lowdown + HTML→DocBook via XSL
We'll need a tool that can convert tons of snippets at pace. Pandoc can do this (as linked before). Shelling out will not be an option, so I suspect that this would involve a custom tool. Could be worth it.
cache pre-generated DocBook
This is terrible developer UX. If we want people to write less docs, this is one way to do it. It will be even more complicated, because or custom tooling can't process this file-by-file like we can in the manual.
Maybe XML + XSLT should be JSON + Python. @pennae has already converted some things to python as part of their eval performance work.
probably, yeah. there's not that much literal docbook in nixos modules that's properly labelled as such, and among the great many docbook-using descriptions that are not properly labelled by far the most use for docbook uses only a few tags (literal, option, filename, programlisting, ...). parsing descriptions as xml fragments and running a custom transform over them actually looks quite feasible!
Hmm yeah, that makes me think that perhaps my migration path is overly correct. We could just do it and fix the occurrences of the small set of tags used after switching.
parsing descriptions as xml fragments and running a custom transform over them actually looks quite feasible!
I would avoid writing any and all parsing logic, and only go from semi-structured data to {markdown + pandoc raw attribute} in our own tooling. It's not worth writing and maintaining a complex solution just to have ok-looking docs during the migration. Of course this is subjective, but the temporary nature is evident. If such a custom transform is a migration tool rather than an integral part of the docs pipeline, I'm all for it. That also takes the pressure off its correctness, as we can fix a few things up by hand.
I bet it's possible to introduce and get rid of pandoc until the next release. It would be a shame to block the refactor for a temporary dependency, as long as it doesn't stay around forever.
We do not have a time frame, so it might take longer than the next release. Especially when we have not even settled on a toolchain yet (currently the candidates are https://github.com/NixOS/nixpkgs/pull/105036 and https://github.com/NixOS/nixpkgs/pull/108063). And people who would be annoyed by the closure do often use unstable.
So, while deciding that having them choose between having pandoc in their system closure and disabling local docs temporarily can be a reasonable trade off for the long term goal of improving the docs infrastructure, it might still turn out few sed
invocations is actually a better interim solution.
I would keep it simple and just automatically convert all description
strings to Markdown in one commit, so without introducing docMD
, and drop support for Docbook descriptions immediately. That way we don't need pandoc in the system closure.
P.S. https://github.com/NixOS/nix/pull/6583 already assumes that option descriptions are in markdown. Supporting other markup languages would just make life more complicated.
we've made an attempt at producing a script to migrate docs.
perl -pi -e 's,<literal>([^`]*?)</literal>,`$1`,g' nixos/modules/**/*.nix
perl -pi -e 's,<replaceable>([^»]*?)</replaceable>,«$1»,g' nixos/modules/**/*.nix
perl -pi -e 's,<filename>([^`]*?)</filename>,`$1`,g' nixos/modules/**/*.nix
perl -pi -e 's,<code>([^`]*?)</code>,`$1`,g' nixos/modules/**/*.nix
perl -pi -e 's,<option>([^`]*?)</option>,`$1`,g' nixos/modules/**/*.nix
perl -pi -e 's,<command>([^`]*?)</command>,`$1`,g' nixos/modules/**/*.nix
perl -pi -e 's,<link xlink:href="(.+?)" ?/>,$1,g' nixos/modules/**/*.nix
perl -pi -e 's,<link xlink:href="(.+?)">(.*?)</link>,[$2]($1),g' nixos/modules/**/*.nix
perl -pi -e 's,<package>([^`]*?)</package>,`$1`,g' nixos/modules/**/*.nix
perl -pi -e 's,<emphasis>([^*]*?)</emphasis>,*$1*,g' nixos/modules/**/*.nix
perl -pi -e 'BEGIN{undef $/;} s,<citerefentry>\s*<refentrytitle>\s*(.*?)\s*</refentrytitle>\s*<manvolnum>\s*(.*?)\s*</manvolnum>\s*</citerefentry>,`$1 ($2)`,smg' nixos/modules/**/*.nix
does a reasonable job of replacing most docbook tag with markdown constructs. around 2000 matches that look like xml tags remain after that, but a lot of that is either actual xml than can be safely ignored (eg fontconfig) or extended lists that have to converted with more thought anyway.
having spent a couple hours trying to hack together something to process the docbook xml snippets, it doesn't quite seem to be worth the trouble over using regexes. descriptions are not well-formed xml to begin with, interpolations split them up even further (and are a mess to track), some descriptions aren't even valid xml fragments—and the main benefit of automatic translation (being able to translate eg varlists) requires all of that to not be like it is. and we'll need sapient inspection of the results anyway. :/
descriptions are not well-formed xml to begin with, interpolations split them up even further (and are a mess to track), some descriptions aren't even valid xml fragments
This is why I proposed doing it as part of the existing pipeline, where we do have quality data.
Another option is to write a script that takes writes out a long markdown document with markers to delineate the individual description
s, etc, split that up, and merge it back into the XML. This way, we don't need pandoc in the build closure.
I guess though, docs quality during the transition is not a priority, so then I suppose leaving some descriptions in a broken state is more acceptable than having pandoc in the closure?
Let me also mention literalDocBook
, which occurs in example
, defaultText
. We'll want to have the same for markdown. At least these are easy to spot for the migration.
maybe we should do something that's a mix of all of the above: temporarily extend/augment the doc merging script we already have to render markdown to docbook, add literalMarkdown
and (eg) mdDoc
markers, and convert descriptions over time. once the conversion is done we could drop mdDoc
and the docbook markers/converters. not going from docbook to markdown but from markdown to docbook seems easier, since docbook is somewhat richer.
(as we've learned from the split docs changes, a flag day event really doesn't work very well. the split docs builds at least broke to call attention to breakages, docbook somehow still making it into the MD future might not be as loud)
while trying some things out we've noticed we can't change some docbook tags without changing eg the manpages. one we noticed in particular was <command>
, which gets rendered specially both in manpages and in the html manual. not quite sure what to do about that. presumably we'll need a new filter like already exists for manpages, eg {command}`stuff`
?
@pennae has found a great solution for the migration. It appears to be ready for use by other contributors. I'd like to merge it in a couple of days, but perhaps more of us could give it a try?
interest seems low, but that might be a discoverability issue. perhaps a round through discourse would be useful, what do you think?
Forgot to share my thoughts: this is great. There are currently at least three places where the "NixOS module system option documentation rendering logic" is defined: nixpkgs, nmd (used by Home Manager, ping @rycee), and nixos-search. It would be nice to unify the three. Merging this now would break some options at nixos-search. Maybe I'll find the energy to port the changes there soon.
nixpkgs
No problems here.
nmd
Home Manager doesn't have to use mdDoc
until they've updated their tooling, or they could make an instantaneous switch if they like. They control both their modules and their tooling, so I don't see a problem here.
nixos-search
Flakes are experimental anyway, so I don't think this is a blocker?
a round through discourse
I don't think the PR needs more review. If anything pops up, we can fix it as we go. It'd be a nicer experience for contributors if their changes are actually mergeable. I'd say merge and then announce on discourse.
A tracking issue for the conversion project would also be helpful for discoverability.
Home Manager doesn't have to use
mdDoc
until they've updated their tooling, or they could make an instantaneous switch if they like. They control both their modules and their tooling, so I don't see a problem here.
I think this situation is not optimal: since Home Manager inherits e.g. lib.literalMD
from nixpkgs it should probably also inherit the associated logic. I feel like there's no good reason to have more than one way to render option documentation. But indeed this is tangential to this issue.
Flakes are experimental anyway, so I don't think this is a blocker?
The confusingly-named flake-info
tool also indexes nixpkgs options and packages, BUT I just remembered that since https://github.com/NixOS/nixos-search/pull/460 we use options.json
directly, and the MarkDown translation here happens before options.json
, so we should be fine.
FWIW, nmd has been moving towards asciidoc for some time and much of the HM manual is written in asciidoc. So it is likely that option descriptions in HM will go that way as well. I've been meaning to switch over but have never found the time to do so. I'll have more spare time sometime in July so might make some moves then.
the HM manual is written in asciidoc
that's a point, actually. nixos does have an asciidoc exporter for module docs, but it's kind of not useful. same goes for the MD exporter at this point, but that will become more useful in the future at least. the asciidoc exporter will likely have to go away though at some point, unless we want to write multiple renderers.
Arion's docs are also asciidoc based and it's the first consumer of the Nixpkgs asciidoc renderer. I'm not aware of any others.
Asciidoc was a good contender before markdown was chosen for the manuals, but I'm avoiding it for new projects. Migrating arion's docs to markdown wouldn't be bad.
Do I read correctly that we don't have any blockers for #176146?
don't think there are any, especially since the scope is still rather limited and changes to the rendering tool are easy enough should problems pop up.
@pennae, do you think the script you wrote in https://github.com/NixOS/nixpkgs/issues/175586#issuecomment-1144040425 is worth promoting to a maintainers/
script, so people can use it as a helper?
It may create a better first iteration than I would, because I still forget to add mdDoc
when writing docs.
we'd have to add another regex s/$( +description =)/\1 lib.mdDoc/
, but sure why not. the resulting changes are untrustworthy without sapient review anyway, as long as it's clear that this is not a perfect process it might be pretty useful
This issue has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/ideas-to-make-it-easier-to-contribute-to-the-documentation/20312/1
I am not going to waste my time prefixing every documentation string in my modules with lib.docMD
. There is no incentive for me to add extra processing to strings that need no processing when nixos is already expensive to evaluate.
@ehmry the markers were added to facilitate a smooth transition, taking into account 3rd party repos. The "extra processing" is very minimal, because conversion happens at build time, not eval time. The markers will be removed in due time.
Home Manager doesn't have to use
mdDoc
until they've updated their tooling, or they could make an instantaneous switch if they like. They control both their modules and their tooling, so I don't see a problem here.
Except for stuff like ~mkEnableOption
,~ mkAliasOptionModule
, ~etc~: https://github.com/nix-community/home-manager/issues/3543#issuecomment-1368008765
I guess nmd could just use (pkgs.nixosOptionsDoc { ... }).optionsJSON
.
mkEnableOption
should be able to handle both and emit options in kind. if mkAliasOptionModule
(or others) can't do that during the transitional period that could well be regarded as a bug. nobody should be forced to support markdown yet
AFAICT that leaves ~two~ three such bugs: _module.args
, mkPackageOption
and mkAliasOptionModule
(other places where mdDoc
is used in the module system are invisible or internal).
what's wrong with _module.args
? the other two are easy enough to fix with ${fn}MD
variants.
It uses mdDoc
so its description also does not show up in the HM manual.
ah, well. that we probably can't fix without either adding special cases to the "no docbook descriptions in nixpkgs" rule, or dropping the rule altogether. neither of those sound like a good path forward. :slightly_frowning_face: maybe this (very special) option could be hidden for now and moved to the manual instead?
Yes, we could make it invisible/internal (can't remember the distinction) for now and add a { options._module.args = mkOption { internal = false; }; }
module to the main options doc.
unfortunately that doesn't seem to be possible. we can either hide it completely or not hide it at all. (we also tried other hacks, like setting a special module argument in the docs build. no dice.)
FWIW I've put together a preliminary PR to support AsciiDoc descriptions in nmd. See https://gitlab.com/rycee/nmd/-/merge_requests/7/diffs
Since CommonMark to some extent is similar I also use this for the mdDoc
description type and it doesn't look entirely awful. I'll see about improving it a bit more and then make use of that in Home Manager.
Just to justify the use of AsciiDoc a bit. The primary reason that HM uses that instead of Markdown is that it simplifies the build pipeline a great deal and reduces the build closure size quite a lot. It also integrates well with the existing DocBook setup. For example, you can see that it was reasonably simple to support AsciiDoc module descriptions in the nmd merge request above without noticeably expanding the closure or build time.
The build closure and build time is more important for HM documentation since it will almost never be fetched from a central binary cache while the Nixpkgs/NixOS documentation will nearly always be available from cache.nixos.org
.
That said, nmd uses the legacy Python version of the asciidoc tooling, which does add a dependency on Python and does not support recent AsciiDoc features. The more up-to-date Asciidoctor pulls in Ruby, which is even heavier so I would be very reluctant to move to that. With the recent resurgence of AsciiDoc development (see https://asciidoc.org/) perhaps there will be some more light-weight tool that can do AsciiDoc → DocBook in the future, perhaps based on libasciidoc.
Of course, if somebody knows of a light-weight Markdown → DocBook converter (that supports description lists) then we could equally well use that in nmd. It's just that I haven't found anything like that in my, admittedly shallow, investigations.
I would highly recommend to follow the module system's migration to markdown, so that all markers can be removed and the contributor experience is consistent.
@rycee did you have a look at the script we use in nixpkgs? It depends on Python + mistune, would that be small enough for your purposes?
I also think we should keep in mind the goal to make consumers able to use pkgs.nixosOptionsDoc
(even for non-NixOS-specific modules) (which uses the above script to handle the MD conversion) to transform a bunch of modules into a standalone documentation snippet. As far as I can tell, the only thing left that would prevent NMD from using it (thereby removing a pile of duplicated code) is its handling of declaration locations, which I've started working on in https://github.com/NixOS/nixpkgs/pull/203994 (but don't look at that PR yet, there are more things I need to think about).
+1 on strongly recommending not adding a third adDoc
format in the wild, things are complicated enough.
@ncfavier Ah yeah, the mergeJSON.py
file looks promising! I'll see if I can integrate that one. It doesn't seem to support definition lists, if I understand correctly, but I think that's OK for now, can always use plain DocBook for that.
It doesn't seem to support definition lists, if I understand correctly
correct. we tried to add those, but the mistune plugin API seems to not be powerful enough to support the syntax nixpkgs already uses. since these lists were rather rare in nixos options we decided to convert them to regular lists instead, but if something pops up that lets us support them: all for it! :)
for the record: option docs now support all features of nixpkgs-flavored markdown, including definition lists.
can this be considered done, or only once docbook in option docs is no longer allowed?
It's done when "mdDoc" is the default behavior of the description
field. I don't think we're already in a position where we can do that, are we? literalDocBook
hasn't been deprecated in the latest release yet, so the next release should still support it, but considering that the alternative, lib.mdDoc
, is already available in all releases, we can remove proper support for docbook in master
after the 23.05 branch-off.
4. Switch the default to markdown, remove the
lib.docMD
calls.
I don't think we're already in a position where we can do that, are we?
technically we can do it, but that would go against the deprecation process for docbook descriptions that we agreed on elsewhere. so let's keep this open until after 23.05, when we can disable docbook support. :)
we can remove proper support for docbook in
master
after the 23.05 branch-off.
Project description
Migrate module docs to markdown.
Pandoc should be able to mix and match :crossed_fingers:
Suggested pipeline
Suggested migration path
description = lib.docMD "..."
, etcdescription
(and other such attrs)lib.docMD
calls.From the broader ecosystem perspective, it would be good to keep the flexible pipeline, as we can use it to dismantle the cottage industry of custom module docs generators. It's a matter of wrapping it with the right pandoc syntax.
Technical details
(just to get a feel for it)
->
->
Some XML
->
Markdown: