NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.07k stars 14.13k forks source link

Migrate modules docs to markdown #175586

Closed roberth closed 1 year ago

roberth commented 2 years ago

Project description

Migrate module docs to markdown.

Pandoc should be able to mix and match :crossed_fingers:

Suggested pipeline

Suggested migration path

  1. Implement the pipeline, so we support multiple languages
  2. Start the conversion to description = lib.docMD "...", etc
  3. Forbid non-wrapped description (and other such attrs)
  4. Switch the default to markdown, remove the lib.docMD calls.
  5. Optionally, simplify the pipeline

From the broader ecosystem perspective, it would be good to keep the flexible pipeline, as we can use it to dismantle the cottage industry of custom module docs generators. It's a matter of wrapping it with the right pandoc syntax.

Technical details

(just to get a feel for it)

description = lib.docDB "This is <literal>docbook</literal>.";

->

description = { type = "doc"; lang = "docbook"; text = "This is <literal>docbook</literal>."; };

->

Some XML

->

Markdown:

    Description:

    ```{=docbook}
    This is <literal>docbook</literal>.


***Alternatives***

 - Maybe XML + XSLT should be JSON + Python. @pennae has already converted some things to python as part of their eval performance work.

**Metadata**

* homepage URL: _module system does not have a dedicated homepage_
* source URL: `lib/`, `nixos/`, `pkgs/`
* license: mit, bsd, gpl2+ , ...
* platforms: unix, linux, darwin, ...
jtojnar commented 2 years ago

It would be nice but there was opposition to including pandoc in the NixOS system closure. So we would either need to decouple the manual from the system, choose some lighter parser (maybe lowdown + HTML→DocBook via XSL), or cache pre-generated DocBook like we currently do with the NixOS manual.

roberth commented 2 years ago

opposition [to increase in build closure size because of pandoc]

Sure! If we don't want pandoc in the build closure for NixOS, we can choose to implement step 5. NixOS' dependency on pandoc will be a temporary affair.

lowdown + HTML→DocBook via XSL

We'll need a tool that can convert tons of snippets at pace. Pandoc can do this (as linked before). Shelling out will not be an option, so I suspect that this would involve a custom tool. Could be worth it.

cache pre-generated DocBook

This is terrible developer UX. If we want people to write less docs, this is one way to do it. It will be even more complicated, because or custom tooling can't process this file-by-file like we can in the manual.

pennae commented 2 years ago

Maybe XML + XSLT should be JSON + Python. @pennae has already converted some things to python as part of their eval performance work.

probably, yeah. there's not that much literal docbook in nixos modules that's properly labelled as such, and among the great many docbook-using descriptions that are not properly labelled by far the most use for docbook uses only a few tags (literal, option, filename, programlisting, ...). parsing descriptions as xml fragments and running a custom transform over them actually looks quite feasible!

roberth commented 2 years ago

Hmm yeah, that makes me think that perhaps my migration path is overly correct. We could just do it and fix the occurrences of the small set of tags used after switching.

parsing descriptions as xml fragments and running a custom transform over them actually looks quite feasible!

I would avoid writing any and all parsing logic, and only go from semi-structured data to {markdown + pandoc raw attribute} in our own tooling. It's not worth writing and maintaining a complex solution just to have ok-looking docs during the migration. Of course this is subjective, but the temporary nature is evident. If such a custom transform is a migration tool rather than an integral part of the docs pipeline, I'm all for it. That also takes the pressure off its correctness, as we can fix a few things up by hand.

zimbatm commented 2 years ago

I bet it's possible to introduce and get rid of pandoc until the next release. It would be a shame to block the refactor for a temporary dependency, as long as it doesn't stay around forever.

jtojnar commented 2 years ago

We do not have a time frame, so it might take longer than the next release. Especially when we have not even settled on a toolchain yet (currently the candidates are https://github.com/NixOS/nixpkgs/pull/105036 and https://github.com/NixOS/nixpkgs/pull/108063). And people who would be annoyed by the closure do often use unstable.

So, while deciding that having them choose between having pandoc in their system closure and disabling local docs temporarily can be a reasonable trade off for the long term goal of improving the docs infrastructure, it might still turn out few sed invocations is actually a better interim solution.

edolstra commented 2 years ago

I would keep it simple and just automatically convert all description strings to Markdown in one commit, so without introducing docMD, and drop support for Docbook descriptions immediately. That way we don't need pandoc in the system closure.

P.S. https://github.com/NixOS/nix/pull/6583 already assumes that option descriptions are in markdown. Supporting other markup languages would just make life more complicated.

pennae commented 2 years ago

we've made an attempt at producing a script to migrate docs.

perl -pi -e 's,<literal>([^`]*?)</literal>,`$1`,g' nixos/modules/**/*.nix
perl -pi -e 's,<replaceable>([^»]*?)</replaceable>,«$1»,g' nixos/modules/**/*.nix
perl -pi -e 's,<filename>([^`]*?)</filename>,`$1`,g' nixos/modules/**/*.nix
perl -pi -e 's,<code>([^`]*?)</code>,`$1`,g' nixos/modules/**/*.nix
perl -pi -e 's,<option>([^`]*?)</option>,`$1`,g' nixos/modules/**/*.nix
perl -pi -e 's,<command>([^`]*?)</command>,`$1`,g' nixos/modules/**/*.nix
perl -pi -e 's,<link xlink:href="(.+?)" ?/>,$1,g' nixos/modules/**/*.nix
perl -pi -e 's,<link xlink:href="(.+?)">(.*?)</link>,[$2]($1),g' nixos/modules/**/*.nix
perl -pi -e 's,<package>([^`]*?)</package>,`$1`,g' nixos/modules/**/*.nix
perl -pi -e 's,<emphasis>([^*]*?)</emphasis>,*$1*,g' nixos/modules/**/*.nix

perl -pi -e 'BEGIN{undef $/;} s,<citerefentry>\s*<refentrytitle>\s*(.*?)\s*</refentrytitle>\s*<manvolnum>\s*(.*?)\s*</manvolnum>\s*</citerefentry>,`$1 ($2)`,smg' nixos/modules/**/*.nix

does a reasonable job of replacing most docbook tag with markdown constructs. around 2000 matches that look like xml tags remain after that, but a lot of that is either actual xml than can be safely ignored (eg fontconfig) or extended lists that have to converted with more thought anyway.

pennae commented 2 years ago

having spent a couple hours trying to hack together something to process the docbook xml snippets, it doesn't quite seem to be worth the trouble over using regexes. descriptions are not well-formed xml to begin with, interpolations split them up even further (and are a mess to track), some descriptions aren't even valid xml fragments—and the main benefit of automatic translation (being able to translate eg varlists) requires all of that to not be like it is. and we'll need sapient inspection of the results anyway. :/

roberth commented 2 years ago

descriptions are not well-formed xml to begin with, interpolations split them up even further (and are a mess to track), some descriptions aren't even valid xml fragments

This is why I proposed doing it as part of the existing pipeline, where we do have quality data.

Another option is to write a script that takes writes out a long markdown document with markers to delineate the individual descriptions, etc, split that up, and merge it back into the XML. This way, we don't need pandoc in the build closure.

I guess though, docs quality during the transition is not a priority, so then I suppose leaving some descriptions in a broken state is more acceptable than having pandoc in the closure?


Let me also mention literalDocBook, which occurs in example, defaultText. We'll want to have the same for markdown. At least these are easy to spot for the migration.

pennae commented 2 years ago

maybe we should do something that's a mix of all of the above: temporarily extend/augment the doc merging script we already have to render markdown to docbook, add literalMarkdown and (eg) mdDoc markers, and convert descriptions over time. once the conversion is done we could drop mdDoc and the docbook markers/converters. not going from docbook to markdown but from markdown to docbook seems easier, since docbook is somewhat richer.

(as we've learned from the split docs changes, a flag day event really doesn't work very well. the split docs builds at least broke to call attention to breakages, docbook somehow still making it into the MD future might not be as loud)

pennae commented 2 years ago

while trying some things out we've noticed we can't change some docbook tags without changing eg the manpages. one we noticed in particular was <command>, which gets rendered specially both in manpages and in the html manual. not quite sure what to do about that. presumably we'll need a new filter like already exists for manpages, eg {command}`stuff`‍?

roberth commented 2 years ago

@pennae has found a great solution for the migration. It appears to be ready for use by other contributors. I'd like to merge it in a couple of days, but perhaps more of us could give it a try?

https://github.com/NixOS/nixpkgs/pull/176146

pennae commented 2 years ago

interest seems low, but that might be a discoverability issue. perhaps a round through discourse would be useful, what do you think?

ncfavier commented 2 years ago

Forgot to share my thoughts: this is great. There are currently at least three places where the "NixOS module system option documentation rendering logic" is defined: nixpkgs, nmd (used by Home Manager, ping @rycee), and nixos-search. It would be nice to unify the three. Merging this now would break some options at nixos-search. Maybe I'll find the energy to port the changes there soon.

roberth commented 2 years ago

nixpkgs

No problems here.

nmd

Home Manager doesn't have to use mdDoc until they've updated their tooling, or they could make an instantaneous switch if they like. They control both their modules and their tooling, so I don't see a problem here.

nixos-search

Flakes are experimental anyway, so I don't think this is a blocker?


a round through discourse

I don't think the PR needs more review. If anything pops up, we can fix it as we go. It'd be a nicer experience for contributors if their changes are actually mergeable. I'd say merge and then announce on discourse.

roberth commented 2 years ago

A tracking issue for the conversion project would also be helpful for discoverability.

ncfavier commented 2 years ago

Home Manager doesn't have to use mdDoc until they've updated their tooling, or they could make an instantaneous switch if they like. They control both their modules and their tooling, so I don't see a problem here.

I think this situation is not optimal: since Home Manager inherits e.g. lib.literalMD from nixpkgs it should probably also inherit the associated logic. I feel like there's no good reason to have more than one way to render option documentation. But indeed this is tangential to this issue.

Flakes are experimental anyway, so I don't think this is a blocker?

The confusingly-named flake-info tool also indexes nixpkgs options and packages, BUT I just remembered that since https://github.com/NixOS/nixos-search/pull/460 we use options.json directly, and the MarkDown translation here happens before options.json, so we should be fine.

rycee commented 2 years ago

FWIW, nmd has been moving towards asciidoc for some time and much of the HM manual is written in asciidoc. So it is likely that option descriptions in HM will go that way as well. I've been meaning to switch over but have never found the time to do so. I'll have more spare time sometime in July so might make some moves then.

pennae commented 2 years ago

the HM manual is written in asciidoc

that's a point, actually. nixos does have an asciidoc exporter for module docs, but it's kind of not useful. same goes for the MD exporter at this point, but that will become more useful in the future at least. the asciidoc exporter will likely have to go away though at some point, unless we want to write multiple renderers.

roberth commented 2 years ago

Arion's docs are also asciidoc based and it's the first consumer of the Nixpkgs asciidoc renderer. I'm not aware of any others.

Asciidoc was a good contender before markdown was chosen for the manuals, but I'm avoiding it for new projects. Migrating arion's docs to markdown wouldn't be bad.

roberth commented 2 years ago

Do I read correctly that we don't have any blockers for #176146?

pennae commented 2 years ago

don't think there are any, especially since the scope is still rather limited and changes to the rendering tool are easy enough should problems pop up.

roberth commented 2 years ago

@pennae, do you think the script you wrote in https://github.com/NixOS/nixpkgs/issues/175586#issuecomment-1144040425 is worth promoting to a maintainers/ script, so people can use it as a helper?

It may create a better first iteration than I would, because I still forget to add mdDoc when writing docs.

pennae commented 2 years ago

we'd have to add another regex s/$( +description =)/\1 lib.mdDoc/, but sure why not. the resulting changes are untrustworthy without sapient review anyway, as long as it's clear that this is not a perfect process it might be pretty useful

nixos-discourse commented 2 years ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/ideas-to-make-it-easier-to-contribute-to-the-documentation/20312/1

ehmry commented 2 years ago

I am not going to waste my time prefixing every documentation string in my modules with lib.docMD. There is no incentive for me to add extra processing to strings that need no processing when nixos is already expensive to evaluate.

roberth commented 2 years ago

@ehmry the markers were added to facilitate a smooth transition, taking into account 3rd party repos. The "extra processing" is very minimal, because conversion happens at build time, not eval time. The markers will be removed in due time.

ncfavier commented 1 year ago

Home Manager doesn't have to use mdDoc until they've updated their tooling, or they could make an instantaneous switch if they like. They control both their modules and their tooling, so I don't see a problem here.

Except for stuff like ~mkEnableOption,~ mkAliasOptionModule, ~etc~: https://github.com/nix-community/home-manager/issues/3543#issuecomment-1368008765

I guess nmd could just use (pkgs.nixosOptionsDoc { ... }).optionsJSON.

pennae commented 1 year ago

mkEnableOption should be able to handle both and emit options in kind. if mkAliasOptionModule (or others) can't do that during the transitional period that could well be regarded as a bug. nobody should be forced to support markdown yet

ncfavier commented 1 year ago

AFAICT that leaves ~two~ three such bugs: _module.args, mkPackageOption and mkAliasOptionModule (other places where mdDoc is used in the module system are invisible or internal).

pennae commented 1 year ago

what's wrong with _module.args? the other two are easy enough to fix with ${fn}MD variants.

ncfavier commented 1 year ago

It uses mdDoc so its description also does not show up in the HM manual.

pennae commented 1 year ago

ah, well. that we probably can't fix without either adding special cases to the "no docbook descriptions in nixpkgs" rule, or dropping the rule altogether. neither of those sound like a good path forward. :slightly_frowning_face: maybe this (very special) option could be hidden for now and moved to the manual instead?

ncfavier commented 1 year ago

Yes, we could make it invisible/internal (can't remember the distinction) for now and add a { options._module.args = mkOption { internal = false; }; } module to the main options doc.

pennae commented 1 year ago

unfortunately that doesn't seem to be possible. we can either hide it completely or not hide it at all. (we also tried other hacks, like setting a special module argument in the docs build. no dice.)

rycee commented 1 year ago

FWIW I've put together a preliminary PR to support AsciiDoc descriptions in nmd. See https://gitlab.com/rycee/nmd/-/merge_requests/7/diffs

Since CommonMark to some extent is similar I also use this for the mdDoc description type and it doesn't look entirely awful. I'll see about improving it a bit more and then make use of that in Home Manager.

rycee commented 1 year ago

Just to justify the use of AsciiDoc a bit. The primary reason that HM uses that instead of Markdown is that it simplifies the build pipeline a great deal and reduces the build closure size quite a lot. It also integrates well with the existing DocBook setup. For example, you can see that it was reasonably simple to support AsciiDoc module descriptions in the nmd merge request above without noticeably expanding the closure or build time.

The build closure and build time is more important for HM documentation since it will almost never be fetched from a central binary cache while the Nixpkgs/NixOS documentation will nearly always be available from cache.nixos.org.

That said, nmd uses the legacy Python version of the asciidoc tooling, which does add a dependency on Python and does not support recent AsciiDoc features. The more up-to-date Asciidoctor pulls in Ruby, which is even heavier so I would be very reluctant to move to that. With the recent resurgence of AsciiDoc development (see https://asciidoc.org/) perhaps there will be some more light-weight tool that can do AsciiDoc → DocBook in the future, perhaps based on libasciidoc.

Of course, if somebody knows of a light-weight Markdown → DocBook converter (that supports description lists) then we could equally well use that in nmd. It's just that I haven't found anything like that in my, admittedly shallow, investigations.

roberth commented 1 year ago

I would highly recommend to follow the module system's migration to markdown, so that all markers can be removed and the contributor experience is consistent.

ncfavier commented 1 year ago

@rycee did you have a look at the script we use in nixpkgs? It depends on Python + mistune, would that be small enough for your purposes?

I also think we should keep in mind the goal to make consumers able to use pkgs.nixosOptionsDoc (even for non-NixOS-specific modules) (which uses the above script to handle the MD conversion) to transform a bunch of modules into a standalone documentation snippet. As far as I can tell, the only thing left that would prevent NMD from using it (thereby removing a pile of duplicated code) is its handling of declaration locations, which I've started working on in https://github.com/NixOS/nixpkgs/pull/203994 (but don't look at that PR yet, there are more things I need to think about).

+1 on strongly recommending not adding a third adDoc format in the wild, things are complicated enough.

rycee commented 1 year ago

@ncfavier Ah yeah, the mergeJSON.py file looks promising! I'll see if I can integrate that one. It doesn't seem to support definition lists, if I understand correctly, but I think that's OK for now, can always use plain DocBook for that.

pennae commented 1 year ago

It doesn't seem to support definition lists, if I understand correctly

correct. we tried to add those, but the mistune plugin API seems to not be powerful enough to support the syntax nixpkgs already uses. since these lists were rather rare in nixos options we decided to convert them to regular lists instead, but if something pops up that lets us support them: all for it! :)

pennae commented 1 year ago

for the record: option docs now support all features of nixpkgs-flavored markdown, including definition lists.

pennae commented 1 year ago

can this be considered done, or only once docbook in option docs is no longer allowed?

roberth commented 1 year ago

It's done when "mdDoc" is the default behavior of the description field. I don't think we're already in a position where we can do that, are we? literalDocBook hasn't been deprecated in the latest release yet, so the next release should still support it, but considering that the alternative, lib.mdDoc, is already available in all releases, we can remove proper support for docbook in master after the 23.05 branch-off.

4. Switch the default to markdown, remove the lib.docMD calls.

pennae commented 1 year ago

I don't think we're already in a position where we can do that, are we?

technically we can do it, but that would go against the deprecation process for docbook descriptions that we agreed on elsewhere. so let's keep this open until after 23.05, when we can disable docbook support. :)

pennae commented 1 year ago

we can remove proper support for docbook in master after the 23.05 branch-off.

239636 removed the last vestiges of the docbook toolchain in our manual builds, and mdDoc behavior has been the default since #237557. removing mdDoc should probably wait for another release to keep backporting effort reasonable both within and without the nixpkgs repo, but the migration is now finally done.