aim42 / htmlSanityCheck

Standalone (batch- and command-line) and Gradle-plugin html sanity checker - detects missing images, dead links and cross-references, duplicate link targets (anchors) and the like.
Apache License 2.0
68 stars 42 forks source link

Wrong emphasize asciidoc fragment leads to Exception #342

Open ascheman opened 4 days ago

ascheman commented 4 days ago

Problem

An asciidoc fragment like the following, currently leads to an exception (see below) when rendered as HTML.

Executes the xref:concepts:workflow.adoc#_catalog_[Catalog] workflow step.

Exception

org.aim42.htmlsanitycheck.tools.Web$InvalidUriSyntaxException: java.net.URISyntaxException: Illegal character in fragment at index 26: ../concepts/workflow.html#<em>catalog</em>
        at org.aim42.htmlsanitycheck.tools.Web.isLocalResource(Web.java:224)
        at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:178)
        at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
        at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682)
        at org.aim42.htmlsanitycheck.check.MissingLocalResourcesChecker.check(MissingLocalResourcesChecker.java:54)
...

Expected behaviour

The problem should be properly reported instead of an exception thrown.

Background

AsciiDoctor renders the code above as

<p>Executes the <a href="../concepts/workflow.html#<em>catalog</em>" class="xref page">Catalog</a> workflow step.</p>

The href in this case does not contain a correct URI reference according to [RFC-2396](https://www.ietf.org/rfc/rfc2396.txt] which is the base for Java's URI class implementation: Its fragment (identifier) must not contain < (and some other characters).

ascheman commented 4 days ago

Asked in Antora Zulip chat whether this is an AsciiDoctor bug.