anaskhan96 / soup

Web Scraper in Go, similar to BeautifulSoup
MIT License
2.18k stars 168 forks source link

Check if element exists without triggering warnings in console? #7

Closed FM1337 closed 7 years ago

FM1337 commented 7 years ago

I'm curious if there's a way to check if an element exists and have a Boolean returned if it does or does not exist rather than having the console just output something like

2017/06/06 11:21:52 Error occurred in Find() : Element `div` with attributes `class title` not found
anaskhan96 commented 7 years ago

There is no way as of now, but a new function could be added, similar to Find(), just returning a boolean if found or not.

lrstanley commented 7 years ago

The current method of how errors are handled seems to be very un-idiomatic. Catching panics should really not be something done inside of a package. I do believe that the approach @FM1337 mentioned should be how these are rewritten. Especially when a library uses logging without interfacing with the user, this is not a very good approach and is generally going to be frowned upon.

Implementing additional functions is one method of getting around this, however imho even if this is a breaking change, it should be rewritten to follow standardized go idioms.

anaskhan96 commented 7 years ago

@lrstanley I plan on removing logging of the errors in an upcoming update. However, when it comes to error handling, returning a second argument as an error variable was the first thought that came to my mind, but it defeated the purpose of making the interface similar to BeautifulSoup (using Find().Find().Text(), conjunctively). So I'm still exploring ways on how to handle the error in a better manner.

lrstanley commented 7 years ago

Although BeautifulSoup is a good tool and package for Python, mirroring it's methods and signatures doesn't quite make sense -- Python invokes quite a bit of magic behind the scenes, intercepting errors, routing them differently, etc. People expect that if there is going to be an error, they will be able to handle it accordingly. With the current available signatures that the package provides, it is not clear without directly looking over your code, what's going on behind the scenes. Even if the current behavior plans to stay, it should be clearly noted that these functions will use internal logging/not return errors, so the package documentation can be transparent to the user.

Shown in the following Go blog post, you see the following mentioned:

The convention in the Go libraries is that even when a package uses panic internally, its external API still presents explicit error return values.

I understand the want to be able to group conjunctive and successive calls between methods, so less boilerplate has to occur, which I think is still easily possible. I may suggest that of what some other libraries do. They implement a field on the object (in this case, Root), for example named Error, which is set on a copy of the Root struct. This way, they can do something like:

links := doc.Find("div", "id", "userIds").Find("something").FindAll("a")

if links.Error != nil {
    // Handle accordingly.
}

An example of a library doing this is gorm (although not the best library in itself, it is able to implement error handling while still having a pass-the-object style syntax.

I also see that soup.Get() has a signature to return an error, however it's not even used currently, and panics still are: https://github.com/anaskhan96/soup/blob/master/soup.go#L34-L48

Thoughts/opinions?

anaskhan96 commented 7 years ago

Implementing a field on the Root object for handling errors sounds great. I had this thought yesterday though I didn't know it was in practical use. I would be implementing this in the upcoming update.