asciidoctor / asciidoctor-epub3

:blue_book: Asciidoctor EPUB3 is a set of Asciidoctor extensions for converting AsciiDoc to EPUB3
https://asciidoctor.org
MIT License
216 stars 67 forks source link

Losing end of link text when using commas #483

Closed niklucas closed 3 months ago

niklucas commented 3 months ago

[Bug] When link text includes a comma, everything after the comma is removed from link text when using asciidoctor-epub3. This doesn't happen when generating standard html / xhtml out of AsciiDoctor.

https://kubernetes.io/docs/reference/labels-annotations-taints/#topologykubernetesiozone[Well-known Labels, Annotations and Taints, `topology.kubernetes.io/zone`]

becomes this out of asciidoctor-epub3:

<p><a href="https://kubernetes.io/docs/reference/labels-annotations-taints/#topologykubernetesiozone" class="link">Well-known Labels</a></p>

Regular Asciidoctor generates:

<div class="paragraph">
<p><a href="https://kubernetes.io/docs/reference/labels-annotations-taints/#topologykubernetesiozone">Well-known Labels, Annotations and Taints, <code>topology.kubernetes.io/zone</code></a></p>
</div>
slonopotamus commented 3 months ago

Hmm... The text is getting swallowed here...

slonopotamus commented 3 months ago

@mojavelinux this looks like a total bug to me in core... It is doing regexes on converted text (???). When we arrive at that line, link_text is "Well-known Labels, Annotations and Taints, <code class="literal">topology.kubernetes.io/zone</code>".

mojavelinux commented 3 months ago

The link text needs to be enclosed in double quotes. Otherwise, this is expected behavior. Please consult the docs.

On Wed, Jul 17, 2024, 10:29 Marat Radchenko @.***> wrote:

Hmm... The text is getting swallowed here https://github.com/asciidoctor/asciidoctor/blob/v2.0.23/lib/asciidoctor/substitutors.rb#L591 ...

— Reply to this email directly, view it on GitHub https://github.com/asciidoctor/asciidoctor-epub3/issues/483#issuecomment-2233721553, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAATL53PKSQFZQXZ4SFPIITZM2LYDAVCNFSM6AAAAABLA2PKTKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZTG4ZDCNJVGM . You are receiving this because you are subscribed to this thread.Message ID: @.***>

slonopotamus commented 3 months ago

Found docs, thanks: https://docs.asciidoctor.org/asciidoc/latest/macros/link-macro-attribute-parsing/#link-text-alongside-named-attributes

niklucas commented 3 months ago

1. Using double quotes

https://kubernetes.io/docs/reference/labels-annotations-taints/#topologykubernetesiozone["Well-known Labels, Annotations and Taints, `topology.kubernetes.io/zone`"]

becomes:

<p><a href="https://kubernetes.io/docs/reference/labels-annotations-taints/#topologykubernetesiozone" class="link">"Well-known Labels, Annotations and Taints, `topology.kubernetes.io/zone`"</a></p>

Where code markup isn't converted.

2. Using double quotes only around the text itself and not the topology.kubernetes.io/zone

https://kubernetes.io/docs/reference/labels-annotations-taints/#topologykubernetesiozone["Well-known Labels, Annotations and Taints," `topology.kubernetes.io/zone`]

Or using a combo with a ^:

https://kubernetes.io/docs/reference/labels-annotations-taints/#topologykubernetesiozone["Well-known Labels, Annotations and Taints,^" `topology.kubernetes.io/zone`]

Yields:

<p><a href="https://kubernetes.io/docs/reference/labels-annotations-taints/#topologykubernetesiozone" class="link">Well-known Labels, Annotations and Taints,</a></p>

3. Using the carat at the end gives:

https://kubernetes.io/docs/reference/labels-annotations-taints/#topologykubernetesiozone["Well-known Labels, Annotations and Taints, `topology.kubernetes.io/zone`^"]
<p><a href="https://kubernetes.io/docs/reference/labels-annotations-taints/#topologykubernetesiozone" class="link">Well-known Labels, Annotations and Taints, <code class=</a></p>

4. Placing just a carat at the end of the link text:

https://kubernetes.io/docs/reference/labels-annotations-taints/#topologykubernetesiozone[Well-known Labels, Annotations and Taints, `topology.kubernetes.io/zone`^]

Is the same as the original problem:

<p><a href="https://kubernetes.io/docs/reference/labels-annotations-taints/#topologykubernetesiozone" class="link">Well-known Labels</a></p>

All that to say, trying to do this with commas and literal / code generating backticks may just be too much. I'm surprised that it works fine in AsciiDoctor's generation of html / xhtml but not here in the epub3 generated xhtml.

slonopotamus commented 3 months ago

I'm not convinced by the docs though... There is no = in document itself. Substitutions operate on converted content where = was injected by the converter. Document author cannot predict what symbols will be injected during convertion process.

mojavelinux commented 3 months ago

It is doing regexes on converted text (???).

This is extremely well known and won't be changed until the AsciiDoc Language specification is completed (as it is one of the key reasons why we are doing the specification).

slonopotamus commented 3 months ago

I'm surprised that it works fine in AsciiDoctor's generation of html / xhtml but not here in the epub3 generated xhtml.

That happens because html converter inserts <code>...</code>, while epub inserts <code class="literal">...</code> and that = explodes the thing. I guess we could stop adding class="literal" and your particular case will be fixed, but I believe there are numerous other ways to end up with = in markup inside the link.

mojavelinux commented 3 months ago

The issue here is that the EPUB 3 converter adds a CSS class to the code span. The double quotes around the CSS class name interfere with the double quotes around the link text. So this is just one of those edge cases in AsciiDoc that require a workaround.

The simplest workaround is to use a passthrough.

https://kubernetes.io/docs/reference/labels-annotations-taints/#topologykubernetesiozone[pass:n[Well-known Labels, Annotations and Taints, `topology.kubernetes.io/zone`]]
mojavelinux commented 3 months ago

I'm surprised that it works fine in AsciiDoctor's generation of html / xhtml but not here in the epub3 generated xhtml.

As Marat pointed out, it's because the EPUB 3 converter doesn't produce the same HTML as the built-in HTML converter. And the extra markup it uses introduces a conflict.

niklucas commented 3 months ago

I see! I will use the passthrough method to workaround this. Thank you both!