KWARC / rust-libxml

Rust wrapper for libxml2
https://crates.io/crates/libxml
MIT License
76 stars 38 forks source link

Allow user-specified ParserOptions #85

Closed jangernert closed 3 years ago

jangernert commented 3 years ago

My use-case is parsing an HTML fragment, manipulating it and then serializing it again as a fragment. Currently there is no way around serializing it as a full HTML document and having to remove the "chrome" again afterwards:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
</body></html>

As far as I understand HTML_PARSE_NOIMPLIED = 8192, // Do not add implied html/body... elements would solve this for me. But the crate doesn't offer the option to pass options for parsing. There is however a comment in the code suggesting that the only reason that this didn't happen is time/manpower. https://github.com/KWARC/rust-libxml/blob/master/src/parser.rs#L201

If it is okay with you @dginev I would like to implement this in a similar fashion as #59.

edit: there is also the workaround of creating a new empty document and moving all the relevant nodes from the old to the new doc. But it's not super obvious and a bit cumbersome.

dginev commented 3 years ago

Definitely OK with me, and a welcome upgrade - it was indeed waiting for someone who needed the feature and had the time to vet it's done right. If you add a small test that ensures your use cases is correctly executed that would be perfect!

So feel free to hack at it, happy to add it in!