Closed ZelphirKaltstahl closed 5 years ago
Yes, and sadly its state is “doesn’t even have a parser”. I’m very sorry that I grabbed this and never followed through, but life got in the way.
If there’s anyone who would like to see this existing, I’d be more that happy to talk to them!
I'd like to be able to parse and translate RST using Rust, ofc. However, I still don't completely understand how to use parser combinators (or Nom or similar in Rust) and I don't know if left recursive or some other grammar capability is needed for RST. I think some thinking about this is necessary before beginning to implement RST and then suddenly noticing "Oh, I cannot do it with this parser library." Do you have more information on such (for example grammar capabilities) requirements for implementing RST?
I’m not sure about the complexity class but I’m pretty sure they’re exactly the same as markdown (which is more popular and therefore more likely had someone think about parsing complexity). Two complicating factors I know both have:
there’s an ANTLR grammar for rST here: https://github.com/antlr/grammars-v4/tree/master/restructuredtext
Document internal linking is exactly what I was worried about in comparison to markdown. I thought maybe one needs to build a preprocessor and then go over it again or something like that. I once wrote some scripts parsing RST to translate links into Latex hyperrefs, when Pandoc did not support all of the link types yet. I think that was about linking to headings. However, I did that with regexes and it was kind of messy already and required my script to run before translating with Pandoc.
Indentation stack seems to make sense.
I've never used ANTLR. It seems to be another parser generator. So the fact that there is already a grammar there means one could translate that to another parser library available in Rust, if I am understanding this correctly (?).
Document internal linking
It doesn’t make a difference but I meant separating links and their targets, which exists in Markdown:
[foo][]
[foo]: https://abc.com
and rST:
foo_
.. _foo: https://abc.com
Those already need the same kind of processing that internal links need as well.
So the fact that there is already a grammar there means one could translate that to another parser library available in Rust
Sadly there’s no ANTLR target for Rust yet (dyweb/mos#20) which AFAIK means that we can’t directly convert that grammar to a rust program with a single command yet. A pity, but it’s still a start!
I'll be pretty happy to have reStructuredText parser in Rust. I personally use reStructuredText a lot, and use Sphinx to genereate the HTML. Sphinx use Docutils underneath, and it has a lot of extension and themes, but I also feel it's kind of slow and spend a lot of resource. If we can have a reStructuredText parser in Rust and start building ecosystem around it, that will be very great, like Rust has the mdBook.
I also found this couple monthes ago, but not sure how complete it is: peg-rst - reStructuredText in C
After a bit of more searching around checking what kind of language class RST is and what means would be required to implement it, I came to the conclusion, that no one seems to know exactly the language class of RST and that it could prove to be quite mathematical to prove, that it cannot be expressed as some grammar for LL LR LALR parsers (I don't claim expert knowledge about parsers, just my impression from recent searching). Even on the sourceforge forum of docutils someone doubted that one could fit RST into an appropriate grammar (no mathematical prove though). Add the facts, that the ANTLR examples seem more like some OOP things with internal state and mutation in the objects and that the docutils canonical implementation seems to be handwritten and not parser combinators or parser generators. However, I also read that clean HTML can be expressed using a CFG (context-free grammar) (https://stackoverflow.com/questions/5175840/is-html-a-context-free-language), so is a CFL (context-free language). While I have little idea about how one would go about writing such grammar, I keep thinking: "But HTML has document internal linking too!" I am unsure about the relationship between LL LR LALR and CFG though.
(1) Initially I naively thought, that one only needs some parser combinators library and sure would be able to implement it. This might not be the case. (2) A handwritten version could be a translation from the docutils canonical implementation. Or something completely different. (3) I could also imagine that it would be possible to use the docutils library from inside Rust with some FFI magic, but then one would always depend on that.
Did anyone find a clear statement on the language class of RST? It would be great to know that before starting to implement anything, that will not work properly.
Hi again!
I’m pretty sure I have nailed it down. In pest-parser/pest#329 I discussed what is needed to parse rST, and I think a (not optimal, but quite OK) solution can be:
[ ]{,7}\t
expands to [ ]{8}
)include
directives and references while creating an AST (or “document tree” as the rST spec calls it)Hi, so progress is finally on the way: A subset of rST can be converted to HTML and the architecture is good enough to iterate on features now!
This is great news! Thanks!
Lately I am not much in the Rust ecosystem, but having a working rST parser in Rust would be very useful for things like implementing blogs or wiki systems or writing documentation or scientific articles, which require often more than what markdown can give (for example document internal linking to other things than headings).
So far the only really working implementation is the reference implementation and that is a custom parser and not based on some declaratively defined grammar, as far as I know.
which require often more than what markdown can give
that’s exactly why I started this project: rST is simply better for technical documentation. markdown is better for communication.
I think it would be great to have a note in the readme file, which tells people about how complete this implementation of RST is. Even if it is simply complete, as in "translates 100% of the constructs of RST correctly", that would be very worth mentioning : ) Afaics this is the only crate for RST available, so that makes it "the state of RST in Rust" basically.