Closed frejanordsiek closed 6 years ago
I'm not sure I understand what you see as the problem here. Plump parses things in a preserving way. This may not be very important for some XML formats, but it most certainly is for HTML and XML+HTML, or for similar markup formats based on XML.
OK, so it is done this way to preserve so that parse and serialize are exact inverses of each other because that is needed for HTML. That makes sense. So it is by design and not a bug.
For what it's worth, you can strip whitespace text from the dom with plump:strip
.
Not sure if this is deliberate design or a bug.
If one takes an XML file that has newlines and indentation for readability such as the following test file
and call it
test.xml
. If I then read it and look the second child with(elt (plump:children (plump:parse #p"test.xml")) 1)
I get a text node like
#<PLUMP-DOM:TEXT-NODE {1004CE6373}>
If I then look at its text with
(plump:text (elt (plump:children (plump:parse #p"test.xml")) 1))
I get
So there is a text node with the newline. Similarly, the first child of the first child node of root is also a text node whose text can be gotten with
(plump:text (elt (plump:children (elt (plump:children (plump:parse #p"test.xml")) 0)) 0))
and is
which has the newline and the indentation.
It is a bit easier to see all of it if
plump-sexp
is used to look at it with(plump-sexp:serialize (plump:parse #p"test.xml"))
which gives