anaskhan96 / soup

Web Scraper in Go, similar to BeautifulSoup
MIT License
2.18k stars 168 forks source link

Crashed with SIGSEGV #39

Closed sunshine69 closed 4 years ago

sunshine69 commented 5 years ago

Trying to run the test weather.go in my machine and got this.

Enter the name of the city : Brisbane panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x665715]

goroutine 1 [running]: github.com/anaskhan96/soup.findOnce(0x0, 0xc0000bdea8, 0x3, 0x3, 0x0, 0x70207e, 0x13) /home/stevek/go/src/github.com/anaskhan96/soup/soup.go:345 +0x315 github.com/anaskhan96/soup.Root.Find(0x0, 0x0, 0x0, 0x75c820, 0xc000364040, 0xc0000bdea8, 0x3, 0x3, 0x0, 0x0, ...) /home/stevek/go/src/github.com/anaskhan96/soup/soup.go:121 +0x82 main.main() /home/stevek/tmp/go-lang/src/weather.go:24 +0x49d exit status 2

sunshine69 commented 5 years ago

run on ubuntu 19.04 -

go version
go version go1.12.10 linux/amd64
sunshine69 commented 5 years ago

It seems a go net/httml bug or something. I experiment with others https://github.com/pysrc/bs and it can not find the node 'div' with class="b_antiTopBleed b_antiSideBleed b_antiBottomBleed" as well (it does not bomb out with nicely error SIGSEGV but it find nothing).

However quick nicely veteran trusty python does find it. And my eyes can find it too using chrome inspections.

from bs4 import BeautifulSoup
import requests

res = requests.get("https://www.bing.com/search?q=weather+hanoi")
In [3]: data = res.content
In [4]: s = BeautifulSoup(data, 'html.parser')
In [6]: node = s.find('div', attrs={'class':'b_antiTopBleed b_antiSideBleed b_antiBottomBleed'})
// content too long to paste here but it is found

This is my experiment having fun (or not fun) with golang - I am completely newbie in golang but not really stable IMHO.

One comment, the error reporting looks bad. While it can not find the node it should output something better than de-referencing the nil pointer. and bomb a poor newbie like me with SIGSEGV which I think a serious flaw. For God sake python3 code very rarely (if not never) gives me this.

sunshine69 commented 5 years ago

The other go soup sample is here for fun

package main

import (
    "fmt"
    "net/http"
    "io/ioutil"
    "github.com/pysrc/bs"
)

func main() {
    resp, err := http.Get("https://www.bing.com/search?q=weather+hanoi")
    if err != nil {
        panic(err)
    }
    defer resp.Body.Close()
    fmt.Println("Response status:", resp.Status)
    html_doc, err := ioutil.ReadAll(resp.Body)

    soup := bs.Init(html_doc)

    // class="story" p
    for i, j := range soup.Sel("div", &map[string]string{"class": "b_antiTopBleed b_antiSideBleed b_antiBottomBleed"}) {
        fmt.Println(i, "Tag", j.Tag)
    }

    soup = bs.Init("https://github.com/")
    for _, j := range soup.Sel("title", nil) {
        fmt.Println("title:", j.Value)
    }
    /*Output:
      title: The world’s leading software development platform · GitHub
      title: 1clr-code-hosting
    */
}

I should log bug to the author as well :laughing:

anaskhan96 commented 4 years ago

Hi @sunshine69, I believe this comment would provide the resolution.