cabo / kramdown-rfc

An XML2RFC (RFC799x) backend for Thomas Leitner's kramdown markdown parser
MIT License
195 stars 83 forks source link

Spurious irefs added after section when abbrev links to heading #203

Open CxRes opened 1 year ago

CxRes commented 1 year ago

This is a complicated one to explain. Please see the attached test case. When the *[Accept-Events]: abbrev links to #foo (L94), an id attached to a section heading, a spurious <iref> of a previously defined definition is introduced after the section heading Foo. Replace the #foo with an inline id, say #a-e (at the end of the previous para), the <iref> vanishes.

Further, instances of Accept-Events are not wrapped unlike other definitions. Again, this is remedied if the id is changed from #foo to #a-e.

Original: draft-gupta-httpbis-per-resource-events-test-2.md

Output: draft-gupta-httpbis-per-resource-events-test-2.xml.txt


Also, it seems the conversion wraps code elements (inside backticks) are wrapped with <spanx> and not <tt> even with -3 flag. Thus, the engine does not replace any Accept-Event instances that are wrapped in back ticks (even when linking to a non-heading id).

Actual output <spanx style="verb">Accept-Events</spanx>

Desired output <tt><iref item="Accept-Events"/><xref format="none" target="a-e">Accept-Events</xref></tt>

CxRes commented 1 year ago

@cabo If you are back at work, can you please have a look at this issue. Thanks!

cabo commented 1 year ago

Also, it seems the conversion wraps code elements (inside backticks) are wrapped with <spanx> and not <tt> even with -3 flag. Thus, the engine does not replace any Accept-Event instances that are wrapped in back ticks (even when linking to a non-heading id).

Actual output <spanx style="verb">Accept-Events</spanx>

Desired output <tt><iref item="Accept-Events"/><xref format="none" target="a-e">Accept-Events</xref></tt>

Unfortunately, code elements (backticks) are protected from the *[...] support that kramdown provides. I don't think this is easy to change, as backticks have several such protection semantics, and disabling the protection just for this processing may not be easy.

Looking at the other part of the report now; I'm not sure yet I understand what you expect to see.

cabo commented 1 year ago

You can work around the protection afforded by backticks by manually adding the <tt> and </tt> tags around the phrase, as in

<tt>Accept-Event</tt>
CxRes commented 1 year ago

Looking at the other part of the report now; I'm not sure yet I understand what you expect to see.

Corresponding to line 94 to 96 in the markdown: In line 105 of the XML, the following <iref item='application client' primary='true'/> should not exist.

You can work around the protection afforded by backticks by manually adding the and tags around the phrase

That is exactly what I am doing in my actual draft. Just makes the text unmarkdownish!

cabo commented 1 year ago

Please try 1.6.42 for the weird shift-around.

I'll continue to look for ways to get code spans into autolink/iref, but I'm not optimistic.

CxRes commented 1 year ago

Unfortunately, its not fixed.

New output on the same markdown draft-gupta-httpbis-per-resource-events-test-2.2.xml.txt

My understanding is than an <iref> on line 105 should not exist. Further, Accept-Events are no longer getting replaced at all. See line 107 and 109 in the new draft and diff that with the old.

cabo commented 1 year ago

So you don't want the functionality of autolink-iref-cleanup: true at all? Set it to false then.

CxRes commented 1 year ago

What does autolink-iref-cleanup do?

cabo commented 1 year ago

What does autolink-iref-cleanup do?

It essentially suppresses auto-irefs inside the section that is also auto-linked (the #foo section in your case). We considered it confusing for an occurrence inside a section to reference that very section in an auto-link. Instead, the iref is moved up to that section, and the links are removed.

CxRes commented 1 year ago

Interesting... you really ought to document that!

It did not occur to me to put Accept-Events in another section to test.

cabo commented 1 year ago

Interesting... you really ought to document that!

Absolutely, and I'll keep this ticket open until that is done.

CxRes commented 1 year ago

I'll continue to look for ways to get code spans into autolink/iref, but I'm not optimistic.

In that case, I would need your help finding some kind of hack. I am defining a field called "Events", which I am wrapping in <tt>, which I want indexed. However, I do not want every instance of the word Events to be indexed.

What I would like to do is this, but it does not work:

*[<tt>Events</tt>] #events-header (((Events)))

What I have tried is to use <tt>{{&zwsp}}Events</tt> and then define the index replacement as:

*[{{&zwsp}}Events] #events-header (((Events)))
*[&zwsp;Events] #events-header (((Events)))
*[<zero-space-typed-out-in-unicode>Events] #events-header (((Events)))

None of these have worked! Any ideas?

(I am happy to take this discussion elsewhere if you feel it is not appropriate here)

cabo commented 1 year ago

The abbrev feature of upstream kramdown does not mix well with entity references. If you actually type out the zwsp (or any other character), it works.

Of course, with many editors these native zwsps are hard to see in the manuscript, and they get in the way in the HTML form (copy/paste gets them, searches don't always work, etc.). So I'm thinking of providing a way to designate certain Unicode characters as to be suppressed in the output, so you can use them for such a purpose and remove them from the final XML.

So then you would

*[⁜Events]: #events-header (((Events)))

This sentence is about the ⁜Events that need to be cited in the index, and other Events that don't.

and declare:

character:
  ⁜: [delete]

in the YAML header so the ⁜ is not actually shown after processing. (This would go with a general Unicode character feature, which will allow to warn about unexpected characters (e.g., typographic quotes) etc. , which needs a way to declare non-ASCII-characters as "expected".)

Feedback welcome...

(I'm also still thinking about a way to have abbrevs reach into backticks, but my simple prototype for this doesn't work yet.)

CxRes commented 1 year ago

IMHO, the cleanest way to solve this would be to preserve Kramown's abbreviation feature. So then you could do:

*[=Events=]: #events-header (((Events))) Events

This sentence is about the <tt>=Events=</tt> that need to be cited in the index, and other Events that don't.

or even

*[=Events=]: <tt> #events-header (((Events))) Events </tt>

This sentence is about the =Events= that need to be cited in the index, and other Events that don't.

I don't know if this is feasible to implement. Also, this would be a breaking change, but it could be possibly be implemented under a flag! It also circumvents the backticks problem. What do you think?

cabo commented 1 year ago

Experimental feature: In 1.6.43, you can now use

*[=Events=]: #header-fields (((Events))) `Events`

Any text added to an abbrev definition beyond what is already allowed now becomes replacement text (in markdown form) for what was matched.

Known bugs: Combining the <nobr> flag with complex XML replacements can crash kramdown-rfc or generate invalid XML output at this point.

CxRes commented 1 year ago

Just tested it out. This is AWESOME!

Thank you so much for fixing these issues. Much Appreciated!!!!!!!

CxRes commented 1 year ago

I have added documentation for the flag in the Syntax2 wiki. Once you finish, I will add the experimental part as well.