Closed jsamr closed 2 years ago
Triggered auto assignment to @johncschuster (AutoAssignerTriage
), see https://stackoverflow.com/c/expensify/questions/4749 for more details.
To help me understand, can you confirm:
Are you sure wasm isn't supported? @robertjchen is planning on using it for encryption, I think.
Can you benchmark this against our current custom markdown>html parser?
I thought the idea was translate directly from markdown to DOM, skipping HTML entirely. But this seems like it is just benchmarking various open source markdown>html converters. Can you restate the goal?
Thanks!
Nice, markdown-it looks like it has a very active community, which we love to see! I'm also excited about the rule customization and plugin architecture it offers.
I'll also need some help to assess which features you want to enable for Expensify.cash.
Let me know what questions you have and how I can help! I am envisioning a phased scope-of-work that might look something like this:
Also, as @quinthar has suggested above, if you could benchmark markdown-it
, etc... against our current markdown -> HTML converter (which lives in our expensify-common repo) that would likely be a great first step before we dive into the phased implementation outlined above.
I thought the idea was translate directly from markdown to DOM, skipping HTML entirely.
This is my understanding as well.
@quinthar
Are you sure wasm isn't supported? @robertjchen is planning on using it for encryption, I think.
My understanding is that WASM is not officially supported at the moment and there is an open ticket for Hermes engine, but you can use react-native-wasm
to execute WASM in a WebView
. Hence I'd rather wait for official support before considering markdown-wasm.
I thought the idea was translate directly from markdown to DOM, skipping HTML entirely. But this seems like it is just benchmarking various open source markdown>html converters. Can you restate the goal?
Yes, the goal is still to translate markdown to DOM! The benchmark is a quick glimpse at each parser capability, assuming that the "whichever-kind-of-syntax-tree-structure to HTML part" performance would be sensibly the same for every engine. I could rewrite the benchmark for accuracy, but honestly I looked for the fastest-to-implement path. To develop the statement in the OP, the plan is to use markdown-it tokenizer and plug it in a htmlparser2
Tokenizer
derived class, and then pass this class reference to htmlparser2
Parser
which in turns emits events consumed by domhandler
to build up a DOM.
Can you benchmark this against our current custom markdown>html parser?
The thing is, all the tested markdown parsers claim compliance with CommonMark, and I doubt this ad-hoc ExpensiMark parser does. So to benchmark your parser, I would need to tailor tests scenarios and make sure the parsers are working in equal conditions. I'd be happy to do that in a preliminary "0" step!
@roryabraham
That sounds like a solid plan. As react-native-render-html
development model is through consulting and sponsoring, would you go along with the idea of financing the initial exploratory steps? And by the way, if ExpensiMark turns out to be faster because it's tailored for your needs, I could also consider a POC based on it (however, its reliance on regex to identify tokens makes me doubt it will).
Triggered auto assignment to @tgolen (Engineering
), see https://stackoverflow.com/c/expensify/questions/4319 for more details.
My understanding is that WASM is not officially supported at the moment and there is an open ticket for Hermes engine, but you can use react-native-wasm to execute WASM in a WebView
To clarify, this is somewhat true- we won't be able to use pure WASM
in React without the WebView workaround, but we can still compile markdown-wasm
to the highly optimized asm.js
JS subset (now confusingly under the umbrella of WebAssembly™️ ).
Theoretically this might net us performance results between markdown-it
and pure WASM markdown-wasm
. Just another option to consider 👍
@robertjchen Thanks for pointing that out! I must confess that I'm not proficient in WASM nor asm.js. After reading markdown-wasm
in surface, I realize that it's basically a wrapper around md4c, compiled in wasm. For our usecase, we would need to get the parse tree generated by md4c, and access the tree in memory with some JS bindings. I don't know if WASM offers this capability! Nonetheless, JSI (JavaScript Interface) offers this feature with C/C++ so that would probably be a better fit! However I am not a C/C++ developer (although I'd love to get my hands on), but I find the prospect very interesting. The Graal would be a transient render engine written in C++ with JSI bindings, but that is clearly a huge endeavor and obviously out of scope :rofl:
@roryabraham @quinthar
Can you benchmark this against our current custom markdown>html parser?
I changed my mind and thought that if it does perform worse, that will be flagrant whether or not the same set of syntax rules are covered. I managed to incorporate ExpensiMark in the benchmark (although in a hacky way because it uses ES modules , I just copied the required sources).
As I had anticipated, ExpensiMark turns out to be pretty slow in comparison to other parsers (I'm still dumbfounded by the bad performance of remark). This is probably because of its regex-based parsing, which requires vastly more input traversals than a automaton "finite state machine" based parser.
I would expect that over time we're going to want our own parser, to do nonstandard things -- we want to own our own grammar. But this is a huge testament to the power of WASM and we should absolutely be aiming for that. Can we just port ExpensiMark to C++ and compile to WASM? No need to use regexp.
@quinthar If you can get your hands on C/C++, I'm not sure I see any advantage of using WASM over JSI. AFAIK WASM is still interpreted, while C/C++ will be run in an architecture-specific binary. Plus, JSI is the future standard for performant low-level react-native modules... An important thing to consider if you want to follow that scheme (porting ExpensiMark to C++ based on md4c), is converting back HTML to markdown. But there are good and fast C++ HTML parsers and that part should be much easier.
No hurry, but please let me know if you're still interested in markdown being supported directly in RNRH, most likely with markdown-it
. mardown-it
allows you to create new syntax rules via plugins so you shouldn't feel limited in that regard.
Hopefully at some point the whole transient render engine will be ported to C++ where we could use md4c for Markdown. In any case we would make sure you can reasonably easily plug-in your own parser implementation to generate a TRT and render that in RNRH.
@jsamr, this Monthly task hasn't been acted upon in 6 weeks; closing.
If you disagree, feel encouraged to reopen it -- but pick your least important issue to close instead.
This is a follow-up on a request from @roryabraham. I have investigated the options to integrate Markdown parsing a little further, and here are my findings:
htmlparser2
Tokenizer class to emit a DOM.Chosing a Tokenizer
I've forked this benchmark to add micromark which I found very well structured and solid (via remark-html), and below are my findings (Intel i7-8809G, 32GB of RAM, Nodejs 14.16.0).
Average Ops per second
Minmax parse time
Average Throughput
Conclusion
Markdown-it is the clear winner, since there is no official web assembly support in React Native. Other plus:
Implementation Plan
Get inspiration from
MarkdownIt.Renderer
: consume a token tree fromMarkdownIt.parse
and invoke correspondinghtmlparser2
callbacks while walking the tree.I'll also need some help to assess which features you want to enable for Expensify.cash.
Package Design
I need to think of a new package design since I don't want
@native-html/core
to depend directly on markdown-it.Testing Strategy
The parser will be tested against the official commonmark-spec repository.