Open c0d1f1ed opened 1 year ago
Hey! Hmm, interesting. I see footnotes explained in GitHub's docs but not in their spec. But yeah, would be good to support that.
The thing that will be a bit tricky is that they overlap with link references, so e.g. this:
Text[^1]
[^1]: https://example.com
Is currently valid Markdown and renders like this (try it out on dingus): Text^1
But e.g. this is not a valid link reference definition, and it results in the second line to be rendered as plain text:
Text[^1]
[^1]: https://example.com test
Looks like GitHub also allows footnotes and link reference definitions to be mixed, as long as the links come first. In other words, this works:
Text[^1] [foo]
[foo]: https://example.com/foo
[^1]: https://example.com/1 test
But this turns the [foo]: https://example.com/foo
into the second line of the footnote text :
Text[^1] [foo]
[^1]: https://example.com/1 test
[foo]: https://example.com/foo
Some other interesting cases:
[^1][]
[^1]: /footnote
This is a footnote, followed by []
as literal text. In CM, it's <a href="/footnote">^1</a>
.
But this is parsed as a full reference link instead:
[^1][foo]
[^1]: /footnote
[foo]: /url
So I think what's happening is that [^1]:
is parsed as a footnote definition, and thus not as a link reference definition. In the second case though, foo
exists as a reference and so it takes precedence. (Without [foo]: /url
, it's a footnote followed by literal [foo]
text.)
I'm actually looking to try and implement this functionality but my attempts so far have failed. Do you have pointers as to where I can start? Is there an extension you could recommend that I follow to get things started? I was using commonmark-ext-ins
as a basis but I think it's getting messed up at org/commonmark/internal/InlineParserImpl.java:156
in the switch statement. Basically my test case is failing.
Here's the new files I've added for the extension and let me know if I should commit as a feature branch for easier collaboration:
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
new file: commonmark-ext-footnotes/pom.xml
new file: commonmark-ext-footnotes/src/main/java/org/commonmark/ext/footnote/Footnote.java
new file: commonmark-ext-footnotes/src/main/java/org/commonmark/ext/footnote/FootnoteExtension.java
new file: commonmark-ext-footnotes/src/main/java/org/commonmark/ext/footnote/internal/FootnoteDelimiterProcessor.java
new file: commonmark-ext-footnotes/src/main/java/org/commonmark/ext/footnote/internal/FootnoteHtmlNodeRenderer.java
new file: commonmark-ext-footnotes/src/main/java/org/commonmark/ext/footnote/internal/FootnoteNodeRenderer.java
new file: commonmark-ext-footnotes/src/main/java/org/commonmark/ext/footnote/internal/FootnoteTextContentNodeRenderer.java
new file: commonmark-ext-footnotes/src/test/java/org/commonmark/ext/footnote/FootnotesTest.java
I am looking into adding footnote support as well. If links referenced as [1] were implemented as an extension, I would borrow it and made it recognize ^ at the start and do something different for my project. Any help how to implement footnotes would be super helpful. 🙇♂️
@robinst @motopascyyy did you have any luck figuring it out?
No, I ended use a different library for my project as this was distracting me too much. At some point in the future I’ll get back to it and issue a PR but right now it’s on hold.
That would be awesome. I am too deep into common mark to change. do you have any files to share via attach to help me write a custom extension? I am at the verge of doing a regexp crime
I'd be also pretty interested in footnotes, trying to see what is missing from the current extensibility. I guess parsing the footnote body as custom blocks starting with [^label]:
wouldn't be a problem, but parsing the inline footnote reference would need #263 and the possibility to validate the footnote reference against a registry of existing labels from the block-level pass (similar to LinkReferenceDefinitions). Maybe a generic way to attach metadata to the document root visible from the inline parsing context? (might be a similar issue to #285).
Would allowing custom delimiter processors to use reserved characters like brackets a step forward?
Alright, you made me curious and I've started looking into this :).
Apart from reverse-engineering how GFM's footnotes work, we can also look at the source code of cmark-gfm. Here's some interesting bits:
CMARK_NODE_FOOTNOTE_DEFINITION
and CMARK_NODE_FOOTNOTE_REFERENCE
'[^' ([^\] \r\n\x00\t]+) ']:' [ \t]*
): https://github.com/github/cmark-gfm/blob/587a12bb54d95ac37241377e6ddc93ea0e45439b/src/scanners.re#L362process_footnotes
which is done at the end of parsing: https://github.com/github/cmark-gfm/blob/c123e68e81725d59f30d5a9bee719125538a6c77/src/blocks.c#L465Note that it looks like something like [^1]
in the text is always parsed as a footnote, and only in process_footnotes
there's a check whether it's in the definition map or not. If not, it is replaced by a text node then. I'm not sure why it's not done the same way as link reference definitions, where references are resolved during inline parsing. (Maybe @kivikakk knows :).)
Unfortunately my memory doesn't go back that far! 🤍
Footnotes are typically defined after their references, so we can't decide if it's a valid reference or not before we've finished reading the entire document. If we don't parse them eagerly, there's a chance some other part of the parser might decide (some part of) the reference should instead be parsed as (part of) something else, but that's likely never correct.
So I think it's a sound way to do things, generally, but I could well be wrong :) I don't even recognise that code as mine any more.
! I only just noticed the part of your comment about link reference definitions. I couldn't tell you why, alas.
Heh, thanks for chiming in :). Yeah for reference links, the definitions are all parsed as part of block parsing, which is the first pass of parsing (before any inline parsing is done). Then during inline parsing, we have all the definitions and can look them up directly.
Branch here, with block parsing of footnote definitions (that part is straightforward): https://github.com/commonmark/commonmark-java/compare/footnotes-extension?expand=1
@robinst thanks for looking into this!
why it's not done the same way as link reference definitions, where references are resolved during inline parsing
I'd also expect a procedure similar to link/definitions parsing
PR is ready now:
I've also found some interesting edge cases that GitHub doesn't handle well, see https://github.com/commonmark/commonmark-java/pull/332#issuecomment-2212453622 :)
Footnotes are arguably an important markup feature but doesn't appear to be supported by CommonMark yet. GFM supports them and we're seeing an increasing number of uses e.g. in Chromium and Android but these don't render correctly in Google Code Search nor gitiles/Gerrit.
(Google bug b/255316523, contact hanwen@ for integration when fixed)