antchfx / htmlquery

htmlquery is golang XPath package for HTML query.
https://github.com/antchfx/xpath
MIT License
727 stars 73 forks source link

Strange result #1

Closed marcelloh closed 5 years ago

marcelloh commented 6 years ago

I have a strange result when checking some stuff, and I was able to reproduce it

    memHtml = "<html><head></head><body><td>check</td></body></html>"

    docLoc, err := htmlquery.Parse(strings.NewReader(memHtml))
    if err != nil {
        panic(err)
    }

    test := htmlquery.OutputHTML(docLoc, false)

test is now:

   <html><head></head><body>check</body></html>

Why are the td-tags missing here?

zhengchun commented 6 years ago

Its looks a html package cause this problem.

The following section from https://godoc.org/golang.org/x/net/html#Render

Calling Parse on arbitrary input typically results in a 'well-formed' parse tree. However, it is possible for Parse to yield a 'badly-formed' parse tree. For example, in a 'well-formed' parse tree, no <a> element is a child of another <a> element: parsing "<a><a>" results in two sibling elements. Similarly, in a 'well-formed' parse tree, no <a> element is a child of a <table> element: parsing "<p><table><a>" results in a <p> with two sibling children; the <a> is reparented to the <table>'s parent. However, calling Parse on "<a><table><a>" does not return an error, but the result has an <a> element with an <a> child, and is therefore not 'well-formed'....

I written a test code to output whole HTML tree.

func main() {
    s := `<html><head></head><body><td>check</td></body></html>`
    node, _ := html.Parse(strings.NewReader(s))
    html.Render(os.Stdout, node)
}

output: <html><head></head><body>check</body></html>

marcelloh commented 5 years ago

Wow, this is really a disappointment. If only the HTML is valid, it will render just fine? Based on the fact that evry html line is created by a human, and human can make mistakes, this is a sad thing.

I also read this: However, it is possible for Parse to yield a 'badly-formed' parse tree. How could I achieve this?

zhengchun commented 5 years ago

I guess you need implement your HTML parser. 😄

marcelloh commented 5 years ago

haha, yep... Perhaps I will ;-)

474420502 commented 4 years ago

oh, shit. i hate the html.Render. the Strange result is possible for me to produce many bugs. libxml2 is also possible to produce many bugs by call the lib of c . the xpath of golang is very bad to use !