anaskhan96 / soup

Web Scraper in Go, similar to BeautifulSoup
MIT License
2.18k stars 168 forks source link

FindNextSibling bug #3

Closed ShevaXu closed 7 years ago

ShevaXu commented 7 years ago

From the source code I see FindNextSibling calls r.Pointer.NextSibling.NextSibling which wrongly assumes NextSibling should have another NextSibling, and crash when it does not.

e.g.

const html = `<html>

  <head>
      <title>DOM Tutorial</title>
  </head>

  <body>
      <a>DOM Lesson one</a><p>Hello world!</p>
  </body>

</html>`

func main() {
    doc := soup.HTMLParse(html)
    link := doc.Find("a")
    next := link.FindNextSibling()
    fmt.Println(next.Text())
}

// $ panic: runtime error: invalid memory address or nil pointer dereference

This also applies for FindPrevSibling.

BTW, I suggest there should be FindNextSibling and FindNextSiblingElement as the spec describes. (This might be another issue, I guess what you want to implement is FindNextSiblingElement.)

anaskhan96 commented 7 years ago

Thanks for reporting it. I'll try to fix it as soon as possible.

ShevaXu commented 7 years ago

I think the fix d325696 does not work because it is r.Pointer.NextSibling might be nil, and calling .NextSibling on it causing the nil pointer dereference crash/panic.

Maybe try this:

func (r Root) FindNextSibling() Node {
    nextSibling := r.Pointer.NextSibling
    if nextSibling == nil {
        log.Fatal("No next sibling found")
    }
    // try to find next element-node?
    if nextSibling.Type == html.ElementNode {
           return Root{nextSibling}
    } else {
           // keep looking
    }
}
anaskhan96 commented 7 years ago

Ah, I see it now. I'll make the fixes accordingly.

anaskhan96 commented 7 years ago

Fixed it in the above commit, and renamed them to FindNextElementSibling() and FindPrevElementSibling() I'll be implementing the actual FindNextSibling() and FindPrevSibling() in the upcoming commits.