aligrudi / neatroff_make

Neatroff top-level makefile
50 stars 15 forks source link

post.url is unsafe for long strings #27

Open rhaberkorn opened 1 year ago

rhaberkorn commented 1 year ago

post.url cannot handle links that span across multiple lines as it assumes that the link text fits into a plain rectangle. For instance:

\*[post.url 1 "sdfklsdflksdfjsdfkljsdflk dfgkgkldfgjdflgjdflkgjdfglkjdfglkjsaf sdfsdfkljsdfklg sdfsdfjksdfj sdfsdfhjsdhfjsdf sdfsdfsdfhjsdfhj"]

Build with:

roff -mpost -ms url-bug.ms | pdf >url-bug.pdf

That's enough to reproduce the "bug". Effectively, links are only safe on single non-breakable words or at the beginning of a line.

Is there a way to fix that? I see that ms .BX macro behaves similarly.

aligrudi commented 1 year ago

Robin Haberkorn @.***> wrote:

post.url cannot handle links that span across multiple lines as it assumes that the link text fits into a plain rectangle. For instance:

\*[post.url 1 "sdfklsdflksdfjsdfkljsdflk dfgkgkldfgjdflgjdflkgjdfglkjdfglkjsaf sdfsdfkljsdfklg sdfsdfjksdfj sdfsdfhjsdhfjsdf sdfsdfsdfhjsdfhj"]

Build with:

roff -mpost -ms url-bug.ms | pdf >url-bug.pdf

That's enough to reproduce the "bug". Effectively, links are only safe on single non-breakable words or at the beginning of a line.

Is there a way to fix that? I see that ms .BX macro behaves similarly.

I cannot think of an easy solution; in addition to font, size, and colour, storing other attributes (just links?) are probably necessary (see wb.c). I am not sure if it is worth the added complexity.

By the way, you can use non-breakable stretchable space (\~) to prevent the URL from being broken into multiple lines.

Ali
rhaberkorn commented 1 year ago

Groff/MOM somehow does it right:

.PRINTSTYLE TYPESET
.START
.PP
.PDF_WWW_LINK http://www.google.de "dfjsdf sdf sdfksdjfdslkf fdsfksldfjsd lfsldflsdkf  dsfksldfjdsflds  sd fsdf sdf sdf dsfs df kljsdflkjdsflksdjf sdf"

Actually, I wanted to use Heirloom and/or Neatroff to get away from Groff/mom. But the deeper I dive into it, the more basic features I find missing or broken. Mom really does something for its money, although it needs 23500 lines of hard to maintain code to achieve that and has plenty of bugs as well. Let's face it, Troff in reality sucks at least for complex scientific documents. The language is too low level, putting too much burden on the macro package to get things right.

aligrudi commented 1 year ago

Robin Haberkorn @.***> wrote:

Groff/MOM somehow does it right:

..PRINTSTYLE TYPESET
..START
..PP
..PDF_WWW_LINK http://www.google.de "dfjsdf sdf sdfksdjfdslkf fdsfksldfjsd lfsldflsdkf  dsfksldfjdsflds  sd fsdf sdf sdf dsfs df kljsdflkjdsflksdjf sdf"

There may be cleaner ways of handling this (compared to changing Neatroff to maintain the value of attributes like links for rendered characters). I do not know what Groff does.

Actually, I wanted to use Heirloom and/or Neatroff to get away from Groff/mom. But the deeper I dive into it, the more basic features I find missing or broken. Mom really does something for its money, although it needs 30000 lines of hard to maintain code to achieve that and has plenty of bugs as well. Let's face it, Troff in reality sucks at least for complex scientific documents. The language is too low level, putting too much burden on the macro package to get things right.

You may be right. I personally think macro packages in Troff are much simpler and more readable than Tex's; it is usually easy to write a new macro package or extend an existing one. This is one reason I like Troff. We do not need a macro package that can be used to produce all kinds of documents.

There are features that are admittedly more difficult to implement in Troff macro packages. Also, the separation of post-processors from Troff makes implementing some post-processor-specific features, like PDF links, challenging. Nevertheless, usually clean solutions to such shortcomings emerge, if they are widely needed.

Ali
rhaberkorn commented 1 year ago

I personally think macro packages in Troff are much simpler and more readable than Tex's

I have no serious experience with Tex, but yes, usually they are easy to hack. The 1970s macro packages like ms however are hard to understand in their entirety because they are restricted to two-letter names and are largely uncommented. Newer macro packages like Utroff lack support for most of the classic preprocessors, though and I am reluctant on trying my hand at implementing them myself. Even if I disable links for the time being, I'd have a bunch of other glitches in ms. For instance when using floats (for figures and the like), you can easily end up with a footnote appearing before its reference in the text. Debugging that will probably take me days. I had the unfounded expectation that the 1970s stuff should be rock solid by now. Turns out, it isn't at all. Perhaps I will give the Plan9 mm and me a try as well.