ctargett / refguide-asciidoc-poc

Proof of concept of Solr Ref Guide converted to asciidoc format & using Asciidoctor for publishing
2 stars 4 forks source link

Some javadoc links aren't getting converted properly #21

Closed ctargett closed 7 years ago

ctargett commented 7 years ago

Kevin Risden pointed out that http://home.apache.org/~ctargett/RefGuidePOC/jekyll-full/using-solrj.html has some broken javadoc links ("SolrClient" and "HttpSolrClient"). Strangely, another javadoc link on the same page ("CloudSolrClient") works just fine.

ctargett commented 7 years ago

Since one paragraph of this page has both a functional and non-functional example, I thought I'd take a look at the raw XHTML of the page in Confluence:

<code>SolrClient</code> is abstract, so to connect to a remote Solr instance, you'll actually create an instance of either <code>
    <ac:link><ri:shortcut ri:key="SolrReleaseDocs" ri:parameter="solr-solrj/org/apache/solr/client/solrj/impl/HttpSolrClient.html"/><ac:plain-text-link-body><![CDATA[HttpSolrClient]]></ac:plain-text-link-body>
    </ac:link>
  </code>, or <ac:link>
    <ri:shortcut ri:key="SolrReleaseDocs" ri:parameter="solr-solrj/org/apache/solr/client/solrj/impl/CloudSolrClient.html"/>
    <ac:link-body> <code>CloudSolrClient</code> </ac:link-body>
  </ac:link>. 

Note in the first example, the <code> tags are before the definition of the link with the <ac:link> tags.

In the second example, the <code> tags are inside the link (in the <ac:link-body> tag).

I think the conversion is having problems with code tags and URLs. There might be a lot of other places where this is happening.

hossman commented 7 years ago

Hmmm... i can easily make the html-cleanup conversion code look for <code> tags that wrap <a> tags and when it finds that situation invert them, but as you say: there may be other problems with code tags.

Lemme think about this some more ... it's already doing special stuff with bold/italics tags inside code ... the simplest thing might just be to make the html-cleanup code (temporarily) fail anytime it finds a tag i haven't explicitly whitelisted inside of a <code> tag so i can review them to see how wide spread the problems are.

hossman commented 7 years ago

NOTE: as part of this fix, i made the html cleanup code log anytime there were still nested tags inside of code tags (which pandoc evidently ignores completely during conversion)

This is the only warning currently being logged by this new code...

 [java] outPage URI: file:/home/hossman/lucid/refguide-asciidoc-poc/confluence-export/cleaned-export/spell-checking.html
 ...
 [java] NOTE: code tag w/nested tags: <code><a href="http://localhost:8983/solr/techproducts/select?spellcheck=true&amp;spellcheck.build=true&amp;spellcheck.q=toyata&amp;qt=/spell&amp;shards.qt=/spell&amp;shards=solr-shard1:8983/solr,solr-shard2:8983/solr" class="external-link" rel="nofollow">http://localhost:8983/solr/techproducts</a>/<a href="http://localhost:8983/solr/techproducts/select?spellcheck=true&amp;spellcheck.build=true&amp;spellcheck.q=toyata&amp;qt=/spell&amp;shards.qt=/spell&amp;shards=solr-shard1:8983/solr,solr-shard2:8983/solr" class="external-link" rel="nofollow">spell</a><a href="http://localhost:8983/solr/techproducts/select?spellcheck=true&amp;spellcheck.build=true&amp;spellcheck.q=toyata&amp;qt=/spell&amp;shards.qt=/spell&amp;shards=solr-shard1:8983/solr,solr-shard2:8983/solr" class="external-link" rel="nofollow">?spellcheck=true&amp;spellcheck.build=true&amp;spellcheck.q=toyata&amp;shards.qt=/spell&amp;shards=solr-shard1:8983/solr</a><a href="http://localhost:8983/solr/techproducts/select?spellcheck=true&amp;spellcheck.build=true&amp;spellcheck.q=toyata&amp;qt=/spell&amp;shards.qt=/spell&amp;shards=solr-shard1:8983/solr,solr-shard2:8983/solr" class="external-link" rel="nofollow">/techproducts</a><a href="http://localhost:8983/solr/techproducts/select?spellcheck=true&amp;spellcheck.build=true&amp;spellcheck.q=toyata&amp;qt=/spell&amp;shards.qt=/spell&amp;shards=solr-shard1:8983/solr,solr-shard2:8983/solr" class="external-link" rel="nofollow">,solr-shard2:8983/solr</a><a href="http://localhost:8983/solr/techproducts/select?spellcheck=true&amp;spellcheck.build=true&amp;spellcheck.q=toyata&amp;qt=/spell&amp;shards.qt=/spell&amp;shards=solr-shard1:8983/solr,solr-shard2:8983/solr" class="external-link" rel="nofollow">/techproducts</a><a href="http://localhost:8983/solr/techproducts/select?spellcheck=true&amp;spellcheck.build=true&amp;spellcheck.q=toyata&amp;qt=/spell&amp;shards.qt=/spell&amp;shards=solr-shard1:8983/solr,solr-shard2:8983/solr" class="external-link" rel="nofollow"></a></code>