SimonJF / skye-gtopdb

Implementation of the GtoPdb in Links
Apache License 2.0
0 stars 0 forks source link

Monadic parser is slow #1

Open SimonJF opened 5 years ago

SimonJF commented 5 years ago

It takes ~0.5-1s to parse a paragraph of text, which is too slow for web use. It may be bad grammar design, or the combinators may be relying on laziness too much still (although I've worked on the latter).

jamescheney commented 5 years ago

I doubt this is easy to address without optimizing Links's evaluator/pattern matching substantially.

Back in the day, Sam implemented an XML parser (using ocamllex/ocamlyacc) that implements the library function parseXml for similar reasons.

Nowadays there are several XML and HTML parsers we could use (in libraries we already depend on) instead of maintaining our own, or writing slow ones in Links, as pointed out in https://github.com/links-lang/links/issues/518.

So in the short term, would hooking markup's parse_html function up as a Links library function parseHtml be any help here? Or are the reHTML text snippets in GtoPDB not well-formed even as HTML fragments?

SimonJF commented 5 years ago

That's probably a good idea. It's not so much of a blocker -- page loads are noticeably a second or two slower than they should be -- but I think perhaps hooking into parseXml might be a good idea. Thankfully I think the GtoPdb markup is valid (as long as you have self-closing tags).

jamescheney commented 5 years ago

Ah, if you mean things like <foo/> or <bar xyz='asdf'/> then this should be fine.

What may not be fine is parsing blocks of text with tags interspersed (i.e. XML fragments that are not themselves XML documents) or tags that don't have fully quoted attribute values (this is allowed in HTML but not XML). I thought I saw examples of both of these, so (re)using an HTML parser that supports HTML fragments (like markup) might make more sense, but shouldn't be hard aside from converting between markup's data structures and Links ones.