anaskhan96 / soup

Web Scraper in Go, similar to BeautifulSoup
MIT License
2.18k stars 168 forks source link

invalid memory address or nil pointer dereference when chaining methods #76

Open alexballas opened 1 year ago

alexballas commented 1 year ago
package main

import (
    "fmt"
    "log"
    "net/http"
    "time"

    "github.com/anaskhan96/soup"
)

func main() {
    go func() {
        http.ListenAndServe(":12345", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
            fmt.Fprint(w, "OK")
        }))
    }()

    time.Sleep(time.Second)

    resp, err := soup.Get("http://127.0.0.1:12345/")
    if err != nil {
        log.Println("Error:", err.Error())
        return
    }

    doc := soup.HTMLParse(resp)
    r := doc.Find("Semething").Find("SomethingElse")
    fmt.Println(r.Error)
}

Hello, If I try to chain Find and FindAll method of non-existent tags like in the example above, I get a panic error

$ go run .
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x66ce1b]

goroutine 1 [running]:
github.com/anaskhan96/soup.findOnce(0x6b64c0?, {0xc00011fe50?, 0x1, 0x1}, 0x2?, 0x0)
        /home/alex/go/pkg/mod/github.com/anaskhan96/soup@v1.2.5/soup.go:502 +0xfb
github.com/anaskhan96/soup.Root.Find({0x0, {0x0, 0x0}, {0x766ee0, 0xc000238030}}, {0xc00011fe50?, 0x1, 0x1})
        /home/alex/go/pkg/mod/github.com/anaskhan96/soup@v1.2.5/soup.go:268 +0xa5
main.main()
        /home/alex/test/play3/main.go:24 +0x1ca
exit status 2

I believe that both func findOnce and func findAllofem should be checking if n *html.Node is nil before proceeding with the processing. Am I understanding this correctly?

Thanks, Alex

akmubi commented 1 year ago

Yep, I think if findOnce and findAllofem return a pointer to an empty html.Node, it'll be fixed. If we just add the nil pointer check, then the next panic will happen in the Find function because it accesses the Data field.

denislituev commented 1 year ago

https://github.com/anaskhan96/soup/pull/78