anaskhan96 / soup

Web Scraper in Go, similar to BeautifulSoup
MIT License
2.18k stars 168 forks source link

Find by single class #15

Closed ghandic closed 6 years ago

ghandic commented 6 years ago

Currently Find("a", "class", "message") would only work if it was <a class="message"></a> but would not work on <a class="message input-message"></a> even though they are both of class message.

Could this be added?

anaskhan96 commented 6 years ago

Good find. I'll look into this.

Salmondx commented 6 years ago

Now it is not working with multiple class items, as it was in the previous build. In source html tags looks like this:

<div class="rasp_place s-place"/>

And I've searched them with this code:

block.Find("div", "class", "rasp_place s-place")

In current build my old code doesn't work anymore, only this works:

block.Find("div", "class", "rasp_place")

Also I think that you should revert to the previous logic as now it became not obvious:

<li class="btn_rasp inactive">

In current build all items with class btn_rasp will be selected, but I don't need items with class inactive. And I have no option to not include them, only to filter via attributes map.

ghandic commented 6 years ago

Do a find on those with ‘inactive’?

Salmondx commented 6 years ago

The only way to find them in the current version is like this:

attrs := showtimeBlock.Attrs()["class"]
if strings.Contains(attrs, "inactive") {
    continue
}
anaskhan96 commented 6 years ago

@Salmondx You have a point. The best way right now is to revert the commit and look into the issue from another angle.

anaskhan96 commented 6 years ago

However, I feel like we should first discuss the significance of finding by a single class, and then proceed from there.

ghandic commented 6 years ago

This is meant to emulate BeautifulSoup so I guess thats the importance 😄

Salmondx commented 6 years ago

@ghandic Exactly, but they have a pretty strange api: https://stackoverflow.com/questions/34356912/is-there-any-strict-findall-function-in-beautifulsoup/34357876

ghandic commented 6 years ago

That may true but as I say it’s trying to emulate it in go so the API should be similar

Salmondx commented 6 years ago

@anaskhan96 do you have any ideas? I could make a PR if we discuss the API

anaskhan96 commented 6 years ago

I looked across other scrapers' implementation and concluded that the Find function should remain as it is. This package was started to emulate the interface of BeautifulSoup (similar function names, conjoined function calls) as mentioned in the README, not how its functions work internally, so I'll stick with that.

However, this issue shall remain open for further discussion and input from other users.

Salmondx commented 6 years ago

May be we should make a Select method with the desired functionality? Like in the link above.

anaskhan96 commented 6 years ago

It's a good idea, although we would have to discuss ways to parse the different kinds of string that Select would receive as an argument.

Salmondx commented 6 years ago

I think that we should use a string as before:

soup.Select("div", "class", "class-1 class-2 class-3")

With strict matching

anaskhan96 commented 6 years ago

I think we should go about adding FindStrict and FindAllStrict which do the job, while retaining the interface similar to Find and FindAll.

soup.FindStrict("div", "class", "class-1 class-2 class-3")

Let's reserve Select for a possible future implementation of BeautifulSoup's select.

Salmondx commented 6 years ago

Ok, sounds good!