ecoron / SerpScrap

SEO python scraper to extract data from major searchengine result pages. Extract data like url, title, snippet, richsnippet and the type from searchresults for given keywords. Detect Ads or make automated screenshots. You can also fetch text content of urls provided in searchresults or by your own. It's usefull for SEO and business related research tasks.
https://github.com/ecoron/SerpScrap
MIT License
257 stars 61 forks source link

Fixes behavior when SERP returns only one page #32

Closed bernardosrulzon closed 6 years ago

bernardosrulzon commented 6 years ago

@ecoron Thanks for all the hard work you've put into SerpScrap! It'll be hugely helpful to improve our SEO strategy and I'm excited about the future of this project.

I've bumped into an issue with search queries that return relatively few results on Google - they will have no pagination control and will fail your validation. You can find an example here.

This quick fix is working for me, but feel free to edit/improve it as you see fit. I'm not sure why you chose to check for pagination as a proxy for a correctly loaded page vs. using the wait_until_title_contains_keyword() method for everything.

I've also edited the page number check because it seems to try to go to the next page event if I only want to scrap a single page - please see if this makes sense

bernardosrulzon commented 6 years ago

I've also taken the opportunity to remove some IE user-agents that were resulting in these weird error messages:

{ "errorMessage":
    "undefined is not an object (evaluating '(y(a)?y(a).parentWindow||y(a).defaultView:window).getComputedStyle(a,null).MozTransform.match')"
}
ecoron commented 6 years ago

Hi, thx for sharing and sorry for the delay, but it's a bussy time for me. the changes looks fine i will check in detail the next days