Tunous / Dawn

Here lies the fork of greatest open-source Reddit app
Apache License 2.0
451 stars 36 forks source link

Markdown parsing tracking issue #383

Open AbsurdlySuspicious opened 3 years ago

AbsurdlySuspicious commented 3 years ago

Draft roadmap:

Alternatively, we should explore use of selftext_html and what preprocessing it would need for everything to render correctly (same list as above, these are most tricky elements)

Some proposals regarding markdownhints / editor:

(Incomplete) list of md related issues (so they all can be closed once it's done):

337 #129

AbsurdlySuspicious commented 3 years ago

-- Previous first post --

This f4884b8 update broke selftext and comments parsing. I found it through bisect and I don't exactly know how it's related, but reverting this and 238c4426 solves the issue.

Most useful info I picked up from stacktraces:

Caused by: java.lang.NoSuchMethodError: No interface method getLine()Ljava/lang/CharSequence; in class Lorg/commonmark/parser/block/ParserState; or its super classes (declaration of 'org.commonmark.parser.block.ParserState' appears in /data/app/me.thanel.dank.debug-qs_urHiMeOYk-xilasg2ZQ==/base.apk!classes3.dex)
  at ru.noties.markwon.tasklist.TaskListBlockParser.line(TaskListBlockParser.java:123)
  at ru.noties.markwon.tasklist.TaskListBlockParser.access$000(TaskListBlockParser.java:24)
  at ru.noties.markwon.tasklist.TaskListBlockParser$Factory.tryStart(TaskListBlockParser.java:101)
  at org.commonmark.internal.DocumentParser.findBlockStart(DocumentParser.java:438)
  at org.commonmark.internal.DocumentParser.parseLine(DocumentParser.java:230)
  at org.commonmark.internal.DocumentParser.parse(DocumentParser.java:109)
  at org.commonmark.parser.Parser.parse(Parser.java:80)
  at me.saket.dank.utils.markdown.markwon.MarkwonBasedMarkdownRenderer.parseMarkdown(MarkwonBasedMarkdownRenderer.java:94)
  at me.saket.dank.utils.markdown.markwon.MarkwonBasedMarkdownRenderer.lambda$getOrParse$0$MarkwonBasedMarkdownRenderer(MarkwonBasedMarkdownRenderer.java:100)
  at me.saket.dank.utils.markdown.markwon.-$$Lambda$MarkwonBasedMarkdownRenderer$tomMcns9vr7ZfpwaRnJBnPcZ1u0.call(Unknown Source:4)
  at me.saket.dank.utils.markdown.markwon.MarkwonBasedMarkdownRenderer.getOrParse(MarkwonBasedMarkdownRenderer.java:105)
  at me.saket.dank.utils.markdown.markwon.MarkwonBasedMarkdownRenderer.parseSelfText(MarkwonBasedMarkdownRenderer.java:138)

I believe it happens for all submissions (at the very least there's no comments), but this one crashes for sure: https://old.reddit.com/r/perl/comments/l6d8ws/perlcom_unfriendly_domain_take_over/

msfjarvis commented 3 years ago

This is crashing because Markwon is on a really old version of commonmark and the API has actually changed. As I said in my PR, I can't find Saket's fork of Markwon so I am not sure if submitting a PR upstream and switching to that will not regress something else.

AbsurdlySuspicious commented 3 years ago

As I said in my PR, I can't find Saket's fork of Markwon

Ah, I missed it, my bad. I've uploaded my copy here: https://github.com/AbsurdlySuspicious/Markwon/tree/saket.linkspan

Well, catching up with latest markwon should be done sooner or later. For now I think we can hold update of commonmark so it won't block 0.12.1 release

AbsurdlySuspicious commented 3 years ago

By the way it should be easier to fix (or at least try to) reddit specific markdown quirks with newer markwon and get rid of buggy regex-based patches (unless we decide to switch to selftext_html provided by reddit (which would still need some preprocessing))

msfjarvis commented 3 years ago

By the way it should be easier to fix (or at least try to) reddit specific markdown quirks with newer markwon and get rid of buggy regex-based patches (unless we decide to switch to selftext_html provided by reddit (which would still need some preprocessing))

Yeah I'm expecting the update to help with that. In the mean time, please file a PR reverting the dependency bumps. The change Saket made in his fork seems to have landed upstream so I'm working on switching us over to the upstream variant directly.

AbsurdlySuspicious commented 3 years ago

I took the liberty (lol) to pin and give it such a grand title because I think laggy markdown parsing is one of the most major current issues. Most importantly, current parser completely ignores new-style spoilers. Updating to latest markwon is the first step, but there's a lot of work ahead. I think we can come up with some roadmap and put it in first post later