Open JKatzwinkel opened 2 years ago
after taking a few superficial glances, just some open ended threads:
_collapse_whitespace
method implements the recommendations linked in Document.collapse_whitespace
, iircremove_blank_text
is rather harsh in its doings, i guess i would have used that option if it was compatible with the aforementioned recommendationsla<hi rendition="u">la</hi>la
collapse_whitespace
option, identical to the parsing of an unpretty Serialisat?i've been thinking that whitespace would make a good major topic for the 0.5 version. and, instead of relying on libxml specifics, we can implement serialization natively. that's to be engineered for the Rust implementation anyway and we can look at the API design (pretty-formatted string representations? always move all namespace declarations to the root node when serializing a document?).
the last cell in the docs/getting_started.ipynb
also gives a great example where pretty is broken. so that demo-case could be taken as one test.
i started looking into this which led me to realise that the serialization doesn't consider the xml:space
attribute yet.
quick update: yesterday i was honest enough to meself to realize that i'm actually traumatized by the task of producing properly placed whitespace. but i still think the target is in eye's sight. let's hope the XML Foundation covers rehab.
currently right now i'm in a manic phase (yes diggin to solve the problem got me to new experiences) and i imagine that the implementation will produce the most beautifullest XML that the world has ever seen and only the radiated überhumen on Mars will be able to deliver something better. anyway after a few hints by @zed-g i have the idea to compile an appendix for the documentation that compares "pretty" XML serialisat productions by different serializers for a small variety of samples.
lxml.etree.tostring
withpretty_print=True
has this caveat:Now instantiating a
delb.Document
with thecollapse_whitespace
flag somewhat feels like it should do away with whitespaces in a way that makes the parsed XML suitable for custom formatting, e.g. calling:...or something like this. However, in order to be able to pretty print delb content, it is still necessary to use a custom parser on instantiation, e.g.
...in which case the
collapse_whitespace
flag of theDocument
constructor isn't even relevant.I feel like wanting to pretty-print delb objects as a usecase is somewhat justified (I needed it today in order to simplify a test), and think that this behaviour is somewhat obscured right now and should at least be documented in some way. But maybe this could even be handled in a more user-friendly way. Is there a point in using
delb.Document
withcollapse_whitespace
without anlxml
parser that also removes whitespace or could the use of such a parser perhaps be implied bycollapse_whitespace
in general?Should
TagNode
have atostring
method with an optionalpretty_print
flag as well?