Open CosmicHorrorDev opened 11 months ago
So what things would we need to do? (Because this sounds like a major rewrite) And what would the HIR contain?
So for starters the AST would likely remain a VecDeque<crate::Element>
. Each element in the HIR would contain all of the crate::interpreter::html::Attr
s that we recognize along with the inner runs of text and other elements. For something like
<p>In a paragraph <a href="https://example.org">https://example.org</a></p>
would get represented as something like
use crate::interpreter::html::{attr::Align, Attr, TagName};
enum HirOrText {
Hir(Hir),
Text(String),
}
vec!Hir {
tag: TagName::Paragraph,
attrs: vec![],
contents: vec![
HirOrText::Text("In a paragraph ".into()),
HirOrText::Hir(Hir {
tag: TagName::Anchor,
attrs: vec![Attr::Href("https://example.org".into())],
contents: vec![HirOrText::Text("https'//example.org".into())]
}),
...
],
}
That would be the main change as the existing code for the AST can be re-written to run over the HIR without having to change much of the logic there (although it opens the door to more freely change things in the future)
You may be interested in the architecture of https://github.com/DioxusLabs/blitz which parses HTML into a dom-like structure before doing style then layout then rendering passes. It's a more ambitious project as we aim to actually render arbitrary HTML (including CSS but excluding JavaScript), but it might make sense for inlyne too.
(once Blitz is a bit more developed you could also straight-up build inlyne of it / help contribute if that's an approach you were interested in)
@kokoISnoTarget is currently working on things with #318, so I'll leave the immediate direction of where things go up to them, but yes I am very interested in sharing more of the nitty gritty HTML bits
(once stylo-dioxus is a bit more developed you could also straight-up build inlyne of it / help contribute if that's an approach you were interested in)
It isn't ready, so nothing I would decide on now. But sounds promising.
We've just landed inline / inline-block support in Blitz (https://github.com/DioxusLabs/blitz). It's still quite early / immature, and we don't have feature parity with inlyne (missing things like text selection, table layout, and hot reloading) but it's now in a state that inlyne devs could look into using / contributing to / building on it if you were interested. This is a screenshot of blitz rendering it's own README (the markdown
example in the repository):
We have a full HTML style/layout engine so this example is using comrak
to compile the README.md
file to HTML and applying a stylesheet from https://github.com/sindresorhus/github-markdown-css (theming is just CSS).
Thanks for the update! Things are looking very nice
I think my only big concern with migrating to using something like blitz is that I don't know if there will be a nice layer of abstraction to describe the parts that inlyne
specifically needs in terms of both 1) the low-level of control we need, and 2) the loss of control over a large chunk of our dependencies
Both of those are pretty crucial to trying to maintain being a lightweight markdown viewer (although that's still more of a future goal than the current state). Maybe it's better at this point for us to steal different things from each other as features land. It's also totally possible that blitz's development will far outpace ours and maybe in the future it'd be silly for us to not use it :shrug:
Having been working with the interpreter more it's clear that the current design is pretty painful to work with in almost every way. It currently works by doing a pass over the
html5ever
tokenstream that stores a pretty minimal amount of state while computing the elements for output on the fly. In theory this is very efficient because we can stream our output and we retain a pretty minimal amount of past state, but it also translates to a soup of conditionals that are incredibly difficult to refactor without regressing other behavior.This also expresses itself where most of the bugs for things lie in not properly keeping track of enough history. Take for instance this markdown snippet
This should render as follows:
First list item
Some nested text
Second list item
but the actual HTML for this is pretty complex
There's quite a lot involved in rendering this right: you've got to keep track of which list item you're in, the first element in the list item (
<li>
) gets the line with the list prefix while the rest get indented further on separate lines, etc. Unfortunately this shows as we do a pretty sloppy job of rendering most things that are nested even semi-complexlyI think the better way forward would be to first convert the raw HTML to an HIR (higher intermediate representation or HTML intermediate representation :grin:) which is just a nice full structured format describing the things we care about in the HTML doc (basically a more complete version of
Element
). Then from there we can more easily do the analysis we need to correctly render things above into our output AST which represents how thing are laid out ininlyne
's eyes