commonmark / commonmark-java

Java library for parsing and rendering CommonMark (Markdown)
BSD 2-Clause "Simplified" License
2.22k stars 283 forks source link

Support for footnotes #273

Open c0d1f1ed opened 1 year ago

c0d1f1ed commented 1 year ago

Footnotes are arguably an important markup feature but doesn't appear to be supported by CommonMark yet. GFM supports them and we're seeing an increasing number of uses e.g. in Chromium and Android but these don't render correctly in Google Code Search nor gitiles/Gerrit.

(Google bug b/255316523, contact hanwen@ for integration when fixed)

robinst commented 1 year ago

Hey! Hmm, interesting. I see footnotes explained in GitHub's docs but not in their spec. But yeah, would be good to support that.

The thing that will be a bit tricky is that they overlap with link references, so e.g. this:

Text[^1]

[^1]: https://example.com

Is currently valid Markdown and renders like this (try it out on dingus): Text^1

But e.g. this is not a valid link reference definition, and it results in the second line to be rendered as plain text:

Text[^1]

[^1]: https://example.com test

Looks like GitHub also allows footnotes and link reference definitions to be mixed, as long as the links come first. In other words, this works:

Text[^1] [foo]

[foo]: https://example.com/foo
[^1]: https://example.com/1 test

But this turns the [foo]: https://example.com/foo into the second line of the footnote text :

Text[^1] [foo]

[^1]: https://example.com/1 test
[foo]: https://example.com/foo

Some other interesting cases:

[^1][]

[^1]: /footnote

This is a footnote, followed by [] as literal text. In CM, it's <a href="/footnote">^1</a>.

But this is parsed as a full reference link instead:

[^1][foo]

[^1]: /footnote

[foo]: /url

So I think what's happening is that [^1]: is parsed as a footnote definition, and thus not as a link reference definition. In the second case though, foo exists as a reference and so it takes precedence. (Without [foo]: /url, it's a footnote followed by literal [foo] text.)

motopascyyy commented 1 year ago

I'm actually looking to try and implement this functionality but my attempts so far have failed. Do you have pointers as to where I can start? Is there an extension you could recommend that I follow to get things started? I was using commonmark-ext-ins as a basis but I think it's getting messed up at org/commonmark/internal/InlineParserImpl.java:156 in the switch statement. Basically my test case is failing.

Here's the new files I've added for the extension and let me know if I should commit as a feature branch for easier collaboration:

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
    new file:   commonmark-ext-footnotes/pom.xml
    new file:   commonmark-ext-footnotes/src/main/java/org/commonmark/ext/footnote/Footnote.java
    new file:   commonmark-ext-footnotes/src/main/java/org/commonmark/ext/footnote/FootnoteExtension.java
    new file:   commonmark-ext-footnotes/src/main/java/org/commonmark/ext/footnote/internal/FootnoteDelimiterProcessor.java
    new file:   commonmark-ext-footnotes/src/main/java/org/commonmark/ext/footnote/internal/FootnoteHtmlNodeRenderer.java
    new file:   commonmark-ext-footnotes/src/main/java/org/commonmark/ext/footnote/internal/FootnoteNodeRenderer.java
    new file:   commonmark-ext-footnotes/src/main/java/org/commonmark/ext/footnote/internal/FootnoteTextContentNodeRenderer.java
    new file:   commonmark-ext-footnotes/src/test/java/org/commonmark/ext/footnote/FootnotesTest.java
MykolaGolubyev commented 9 months ago

I am looking into adding footnote support as well. If links referenced as [1] were implemented as an extension, I would borrow it and made it recognize ^ at the start and do something different for my project. Any help how to implement footnotes would be super helpful. 🙇‍♂️

MykolaGolubyev commented 8 months ago

@robinst @motopascyyy did you have any luck figuring it out?

motopascyyy commented 8 months ago

No, I ended use a different library for my project as this was distracting me too much. At some point in the future I’ll get back to it and issue a PR but right now it’s on hold.

MykolaGolubyev commented 8 months ago

That would be awesome. I am too deep into common mark to change. do you have any files to share via attach to help me write a custom extension? I am at the verge of doing a regexp crime

zampino commented 3 months ago

I'd be also pretty interested in footnotes, trying to see what is missing from the current extensibility. I guess parsing the footnote body as custom blocks starting with [^label]: wouldn't be a problem, but parsing the inline footnote reference would need #263 and the possibility to validate the footnote reference against a registry of existing labels from the block-level pass (similar to LinkReferenceDefinitions). Maybe a generic way to attach metadata to the document root visible from the inline parsing context? (might be a similar issue to #285).

Would allowing custom delimiter processors to use reserved characters like brackets a step forward?

robinst commented 3 months ago

Alright, you made me curious and I've started looking into this :).

Apart from reverse-engineering how GFM's footnotes work, we can also look at the source code of cmark-gfm. Here's some interesting bits:

Note that it looks like something like [^1] in the text is always parsed as a footnote, and only in process_footnotes there's a check whether it's in the definition map or not. If not, it is replaced by a text node then. I'm not sure why it's not done the same way as link reference definitions, where references are resolved during inline parsing. (Maybe @kivikakk knows :).)

kivikakk commented 3 months ago

Unfortunately my memory doesn't go back that far! 🤍

Footnotes are typically defined after their references, so we can't decide if it's a valid reference or not before we've finished reading the entire document. If we don't parse them eagerly, there's a chance some other part of the parser might decide (some part of) the reference should instead be parsed as (part of) something else, but that's likely never correct.

So I think it's a sound way to do things, generally, but I could well be wrong :) I don't even recognise that code as mine any more.

kivikakk commented 3 months ago

! I only just noticed the part of your comment about link reference definitions. I couldn't tell you why, alas.

robinst commented 3 months ago

Heh, thanks for chiming in :). Yeah for reference links, the definitions are all parsed as part of block parsing, which is the first pass of parsing (before any inline parsing is done). Then during inline parsing, we have all the definitions and can look them up directly.

robinst commented 3 months ago

Branch here, with block parsing of footnote definitions (that part is straightforward): https://github.com/commonmark/commonmark-java/compare/footnotes-extension?expand=1

zampino commented 3 months ago

@robinst thanks for looking into this!

why it's not done the same way as link reference definitions, where references are resolved during inline parsing

I'd also expect a procedure similar to link/definitions parsing

robinst commented 1 week ago

PR is ready now:

I've also found some interesting edge cases that GitHub doesn't handle well, see https://github.com/commonmark/commonmark-java/pull/332#issuecomment-2212453622 :)