Closed james-lawrence closed 4 years ago
@zhengchun @james-lawrence I have a question about this PR:
With this change, xmlquery.parse()
no longer creates a node with type == TextNode
anymore. As far as I can tell, only place we create a node with type == TextNode is during the xpath navigation:
func getCurrentNode(it *xpath.NodeIterator) *Node {
n := it.Current().(*NodeNavigator)
if n.NodeType() == xpath.AttributeNode {
childNode := &Node{
Type: TextNode,
Data: n.Value(),
}
return &Node{
Parent: n.curr,
Type: AttributeNode,
Data: n.LocalName(),
FirstChild: childNode,
LastChild: childNode,
}
}
return n.curr
}
Want to confirm this is the right construct of using TextNode
. Basically I'm questioning why we needed to introduce CharDataNode
type. It is not necessary to have absolutely matching names for the types. Calling it TextNode
to represent a CDATA
section is fine in my opinion. This way, we can use TextNode
to represent anything related to raw text, including raw text in element nodes, or raw text for attributes. That said, I'm not too obsessed with this TextNode
name: I'm perfectly okay we switch to use CharDataNode
(given the use of CharData
in std lib xml
). If that's what we want, then we should get rid of TextNode. Basically I'm asking: we should have just one of them, either CharDataNode
or TextNode
, not both (personally like TextNode a bit more since it's more generic, plus it matches well with xpath.NodeType
very well). Agree, disagree?
CharDataNode
and TextNode
is the two node type for text, they are very different. CharDataNode
can include the specified characters like <
,>
,&
and more others, I can include HTML code in the XML via uses <![CDATA[ ]]>
https://en.wikipedia.org/wiki/CDATA#CDATA_sections_in_XML . It is useful when call OutputXML()
function to output XML. Sometime we need print all XML body even include <![CDATA[ ]]>
. If we missing CharDataNode
we will don't know this node text whether has include HTML character and output the XML will been broken.
<style type="text/css">
/*<![CDATA[*/
body { background-image: url("marble.png?width=300&height=300") }
/*]]>*/
</style>
I checked code again, I found there is not implement <![CDATA[ ]]>
on OutputXML()
function. This is a Bug.
@zhengchun then what is TextNode
is used for? Apparently in xmlquery
library, TextNode
isn't used at all during parsing, so basically we will never get a *Node
tree that has a TextNode
in it. The only place TextNode
is used is to construct an artificial node to hold attribute value during NodeNavigator
traverse. If that's the only use case, why can't we use CharDataNode
in this case?
My whole point is: seems like we don't need both TextNode or CharDataNode. I don't care about the naming, either one is fine with me. But having two of them lying around adds confusion and potentially bug (as you discovered).
@jf-tech , If package just for query, we don't need have two TextNode
and CharDataNode
. Sometime we need save the query result to XML file(although it may be infrequent). CharDataNode
can flag this node including some specified characters. TextNode
is flag all text that don't include <![CDATA[ ]]>
. For example:
s := `<?xml version="1.0"?>
<script>
<![CDATA[ They're saying "x < y" & that "z > y" so I guess that means that z > x ]]>
</script >
`
doc, _:= xmlquery.Parse(strings.NewReader(s))
fmt.Println(doc.OutputXML(false))
saveXML(doc.OutputXML(false))
If missing CDATA
, the output like below:
<?xml version="1.0"?><script>They're saying "x < y" & that "z > y" so I guess that means that z > x</script>
Apparently this XML has broken, if we save this output to the XML file and read it in the other tool.
@jf-tech , I add a bug report about <![CDATA[ ]]>
at https://github.com/antchfx/xmlquery/issues/36
Okay, I'm fine now, thx for the discussion.
While my primary objection was this: given xmlquery.parse()
never really creates a TextNode
, and a TextNode
is only created during xpath navigation getCurrentNode()
for attribute node's value, why did we ever need the type. But now I realize this: given we need this OutputXML
functionality, then we do want different ways to serialize out an element node's text data (in CDATA format), vs an element node's attribute value (in a direct quoted text format). If we all use CharDataNode
for attribute value node in getCurrentNode()
it would cause OutputXML
spit out the attribute value looking completely goofy.
I'm good now. Thx!
fixes #30.
updated travis.yml to not test against golang 1.7 and 1.8 due to group cache pulling in math/bits which were not backported. (imo should just remove groupcache and add an internal/lru package to supply the 100 lines or so of code for the cache. but out of scope for this PR)