Nv7-GitHub / googlesearch

A Python library for scraping the Google search engine.
https://pypi.org/project/googlesearch-python/
MIT License
480 stars 117 forks source link

Some requests break the search #66

Closed arditobryan closed 2 months ago

arditobryan commented 1 year ago

I find out there are some queries that make the search function stall for over 1 minute, then they return 429, regardless of waiting time. Ex. "Malaysia sugar tax, RM0.40 (US$0.086) per litre, more than 5 grams/100ml" takes a few seconds to retrieve the first 2 links, but at the 3rd, it makes me wait 1:30 mins, then returns 429, and the IP is unusable. I tried the same query on Google Colab (that should not use my IP), yet, to be sure, I also tried switching internet connection to the phone hotspot and using an EC2, and all lead to the same results (breaking at the 3rd link of the same query): some queries can break the algorithm

Ideally, we should use the timeout params for the requests, but (I tried) it does not work in the case above. While adding delays or user-agent can help prevent the 429 as a whole, I think this specific issue still needs to be addressed.

Nv7-GitHub commented 12 months ago

When I search this up on google there are no results. Do you think this could be causing the issue?

arkothiwala commented 11 months ago

The same behavior is observed for me as well by running the script in google colab, local system, AWS ec2 instance. My search queries are phone model names like "iPhone 14" or "Google pixel 7 pro"

slippyC commented 10 months ago

Yep, I'm also getting a 429 error as well. I have limited the number of returns hoping that might resolve the issue. I have done a search on something as simple as "time".

Edited It's throwing up a captcha. Is there any suggested timeout that people are having luck with?

Nv7-GitHub commented 2 months ago

This has been fixed. The reason it was happening is because there are no results, so it would loop forever. However, now if it finds no results it returns.