Debian / debiman

debiman generates a static manpage HTML repository out of a Debian archive
Apache License 2.0
189 stars 46 forks source link

Bullet list rendered as multiple <dl>s #67

Open ilmari opened 7 years ago

ilmari commented 7 years ago

Each bullet in the lists in capabilities(7) is rendered as a separate definition list, like this:

<dl>
    <dd>*</dd>
    <dt>…</dt>
</dl>

Instead each list should be a single unordered list:

<ul>
    <li>…</li>
    …
</ul>
stapelberg commented 7 years ago

This seems to be an upstream issue with mandoc:

$ mandoc -Thtml /usr/share/man/man7/capabilities.7.gz
[…]
<dl class="Bl-tag">
  <dt class="It-tag"><b>CAP_AUDIT_CONTROL</b> (since Linux 2.6.11)</dt>
  <dd class="It-tag">Enable and disable kernel auditing; change auditing filter
      rules; retrieve auditing status and filtering rules.</dd>
</dl>

Could you report it at http://mdocml.bsd.lv/contact.html please, or would you prefer if I relayed the report?

lahwaacz commented 7 years ago

This is an inherent problem of converting the old man(7) language to HTML. The snippet from capabilities(7) is written with the .IP macro as follows:

.IP * 2
Bypass file read permission checks and
directory read and execute permission checks;
.IP *
Invoke
.BR open_by_handle_at (2).

Literally speaking, the mandoc output is correct, because the .IP macro is intended for definition lists. The thing is that with * as a header, it looks exactly as bullet-point list in plain-text output, where the * is flushed into the left margin. The semantically correct solution is provided by the mdoc(7) language and its .Bl and .It macros.

To make existing manuals written in the man language more visually pleasing, I think it would be best to modify mandoc's HTML formatting to treat .IP *, .IP - etc. as unordered lists and produce the <ul> tags instead of <dl> tags. Alternatively you could do it in the post-processing phase or even style the dt and dd tags to appear on the same line, but you'd still need to recognize the bullet-definitions from other definitions.

(As for contacting mandoc upstream, their contact page says that messages on all three mailing lists are publicly visible, but there is no link to a viewer. Do you know it? I'd like to read some existing bug reports or discussions.)

stapelberg commented 7 years ago

cc @ischwarze

ischwarze commented 7 years ago

I agree there is room for improvement in mandoc, so i added an entry to my TODO list:

format ".IP *" etc. as <ul> rather than <dl>

I suspect that is feasible with a bit of heuristic inspection, but it's not completely trivial, so i'm not doing it right away, but i did mark it as relatively high priority: the impact is cosmetic, but the resulting ugliness is above average for a cosmetic issue.

In general, man(7) HTML formatting is less refined than mdoc(7) HTML formatting, and harder to implement nicely, but that's no excuse for not trying.

That said, i see a cosmetic issue with debiman as well. The upstream mandoc.css contains detailed CSS code to nicely format class "Bl-tag" lists, in particular to make sure that tags appear left of bodies if they fit, or above bodies otherwise - in fact, that's the part of mandoc.css that was hardest to tune. While debiman produces large amounts of CSS code - more than i would deem reasonable - this specific detail seems to be missing, resulting in ugly display of "Bl-tag" lists in general. In particular, the tags never seem to appear to the left of the respective body, not even if they are short.

lahwaacz commented 7 years ago

Very similar case is in systemd.environment-generator(7) where the list is written as

.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
Generators are executed sequentially in the alphanumerical order of the final component of their name\&. The output of each generator output is immediately parsed and used to update the environment for generators that run after that\&. Thus, later generators can use and/or modify the output of earlier generators\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
Generators are run by every manager instance, their output can be different for each user\&.
.RE
.PP

and the HTML version is

<div style="margin-left: 4.00ex;">•Generators are executed sequentially in
  the alphanumerical order of the final component of their name. The output of
  each generator output is immediately parsed and used to update the environment
  for generators that run after that. Thus, later generators can use and/or
  modify the output of earlier generators.</div>
<div style="height: 1.00em;"> </div>
<div style="margin-left: 4.00ex;">•Generators are run by every manager
  instance, their output can be different for each user.</div>
<div class="Pp"></div>

which still looks rather ugly.

ischwarze commented 5 years ago

I finally implemented this feature request in: http://mandoc.bsd.lv/cgi-bin/cvsweb/man_html.c#rev1.173 The change will be contained in the next release, which will likely be called mandoc-1.14.5.

Here is an example with mandoc(1) from CVS HEAD:

$ mandoc -Thtml /co/linux-man-pages/man7/capabilities.7
<div class="Bd-indent">
<ul class="Bl-bullet">
  <li>Bypass file read permission checks and directory read and execute
      permission checks;</li>
  <li>invoke <b>open_by_handle_at</b>(2);</li>
  <li>use the <b>linkat</b>(2) <b>AT_EMPTY_PATH</b> flag to create a link to a
      file referred to by a file descriptor.</li>
</ul>
</div>
ischwarze commented 5 years ago

Very similar case is in systemd.environment-generator(7)

That isn't similar at all and i think putting it into the same bugtracking ticket is very misleading.

where the list is written as

.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
Generators are executed sequentially in the alphanumerical order of the final component of their name\&. The output of each generator output is immediately parsed and used to update the environment for generators that run after that\&. Thus, later generators can use and/or modify the output of earlier generators\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
Generators are run by every manager instance, their output can be different for each user\&.
.RE
.PP

That is man(7) code of such low quality that it is kind of a stretch to even call it "man(7)"; calling it "low-level roff(7) trickery" would be more to the point. Such low-level stuff definitely has no place in a manual page. People can't really expect to get input semantically translated to HTML when they rely on manual horizontal movements, moving left and right on the printing paper. HTML simply contains no facilities to represent such manual printing head movements, and a formatter has very little chance to guess what the semantic intention of the author might be.

Please report the manual page as broken upstream and tell upstream to properly use .IP macros and to not use \h escapes.

Mandoc rendering still is:

<div class="Bd-indent">&#x2022;Generators are executed sequentially in the
  alphanumerical order of the final component of their name. The output of each
  generator output is immediately parsed and used to update the environment for
  generators that run after that. Thus, later generators can use and/or modify
  the output of earlier generators.</div>

I don't see any reasonable way how this could be improved.

lahwaacz commented 5 years ago

That isn't similar at all

There is .IP \(bu, an approximately as common sequence as .IP *, which can be found even in GNU's roff(7) itself. If you handle .IP * specially, you might as well handle .IP \(bu. The other macros/escapes which are ignored in the HTML conversion don't make this case unsimilar.

ischwarze commented 5 years ago

Hi @lahwaacz ,

i agree that ".IP *" and ".IP \(bu" are similar, and that there is nothing wrong with using "\(bu" in manual pages, and indeed the patch i recently committed - see the "bob-beck pushed a commit to openbsd/src" right above - handles both.

What i meant with "isn't similar at all" was this horrible code from systemd.environment-generator(7):

\h'-04'\(bu\h'+03'\c

If you look closely, you will see that the ".IP \(bu 2.3" in that manual page is in an inactive .el clause: "ie n" is always true for manual pages (except when formatting with a real typesetter for PostScript or PDF output), so the .el clause is never entered.

lahwaacz commented 5 years ago

Oh, in that case you're right. On closer look, systemd seems to use xsltproc to generate their man pages from XML.

ischwarze commented 5 years ago

Hi @lahwaacz ,

On closer look, systemd seems to use xsltproc to generate their man pages from XML

... and more specifically, from DocBook 4.2:

https://github.com/systemd/systemd/blob/master/man/systemd-environment-d-generator.xml

So no wonder the output is crap. DocBook is by far the worst and lowest quality file format you can pick for documentation. It is absolutely notorious for generating abysmal man(7) output as well as for being full of bugs and almost unmaintained.