aantron / markup.ml

Error-recovering streaming HTML5 and XML parsers
http://aantron.github.io/markup.ml
MIT License
146 stars 16 forks source link

Make it possible to pass context and encoding arguments from external sources #76

Open dmbaturin opened 1 year ago

dmbaturin commented 1 year ago

The type of Markup.parse_html is ?report:(location -> Error.t -> unit) -> ?encoding:Encoding.t -> ?context:[< `Document | `Fragment of string ] -> (char, 's) stream -> 's parser. That is quite handy in that you can call it without context and encoding arguments.

If you are writing a utility of a library that calls Markup.parse_html, that can become a source of annoynance. With encoding it's easy: the default is Markup.Encoding.utf_8, so you can easily do something like:

Markup.parse_html ~encoding:(Option.value ~default:Markup.Encoding.utf_8 settings.encoding

The option wrap-unwrap cycle is rather inelegant but at least there's no code duplication needed.

However, with the context argument, there is no way to achieve the same effect in one function call! The value that parse_html takes can be either `Document or `Fragment "tag" — two-state. There is no polymorphic variant constructor that would mean "guess the context", like Html_parser.parse None ... does.

So, the only way to pass the context from an external source is to use two different calls, one for "context not given, guess", the other for explicit context.

match settings.context with
| None -> ... |> Markup.parse_html ~encoding:settings.encoding |> ...
| Some c -> ... |> Markup.parse_html ~context:c ~encoding:settings.encoding |> ...

I'm not sure what's the best way out. I see three options:

dmbaturin commented 1 year ago

Ok, I completely missed the fact that you can set an implicit optional argument explicitly with ?context:settings.context.

But I still believe there should be an explicit polymorphic variant for the default context.