Nv7-GitHub / googlesearch

A Python library for scraping the Google search engine.
https://pypi.org/project/googlesearch-python/
MIT License
430 stars 110 forks source link

Infinite loop fetching when using `search` function #74

Open palmer-cl opened 5 months ago

palmer-cl commented 5 months ago

It appears the search function is broken, and calls to the search function get stuck in an infinite loop.

You can reproduce this easily with a simple script like this one:

from googlesearch import search
import logging

logging.basicConfig(level=logging.DEBUG)

print("Starting search...")
res = search("nhl bowen byram")
print("Finished search.")
list_of_urls = [x for x in res]
print(list_of_urls)

Also tried to just convert the generator to a list with the same outcome:

from googlesearch import search
import logging

logging.basicConfig(level=logging.DEBUG)

print("Starting search...")
res = search("nhl bowen byram")
print(list(res))
print("Finished search.")

The output of the following:

Hello World
finished
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.google.com:443
DEBUG:urllib3.connectionpool:https://www.google.com:443 "GET /search?q=nhl%2Bbowen%2Bbyram&num=12&hl=en&start=0 HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.google.com:443
DEBUG:urllib3.connectionpool:https://www.google.com:443 "GET /search?q=nhl%2Bbowen%2Bbyram&num=12&hl=en&start=0 HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.google.com:443
DEBUG:urllib3.connectionpool:https://www.google.com:443 "GET /search?q=nhl%2Bbowen%2Bbyram&num=12&hl=en&start=0 HTTP/1.1" 200 None

To debug this further, I put trace statements in the package, and it looks like start and num_results are never updated:

    # Fetch
    start = 0
    while start < num_results:
        print(start, num_results)
        # Send request
        resp = _req(escaped_term, num_results - start,
                    lang, start, proxies, timeout)

Result:

01/30/2024 08:31:43 AM Results from Google: <generator object search at 0x122abcb30>
0 10
01/30/2024 08:31:43 AM Starting new HTTPS connection (1): www.google.com:443
01/30/2024 08:31:43 AM https://www.google.com:443 "GET /search?q=nhl%2Bbowen%2Bbyram&num=12&hl=en&start=0 HTTP/1.1" 200 None
0 10
01/30/2024 08:31:44 AM Starting new HTTPS connection (1): www.google.com:443
01/30/2024 08:31:44 AM https://www.google.com:443 "GET /search?q=nhl%2Bbowen%2Bbyram&num=12&hl=en&start=0 HTTP/1.1" 200 None
0 10
01/30/2024 08:31:45 AM Starting new HTTPS connection (1): www.google.com:443
01/30/2024 08:31:45 AM https://www.google.com:443 "GET /search?q=nhl%2Bbowen%2Bbyram&num=12&hl=en&start=0 HTTP/1.1" 200 None
0 10
01/30/2024 08:31:45 AM Starting new HTTPS connection (1): www.google.com:443
01/30/2024 08:31:45 AM https://www.google.com:443 "GET /search?q=nhl%2Bbowen%2Bbyram&num=12&hl=en&start=0 HTTP/1.1" 200 None
0 10
01/30/2024 08:31:45 AM Starting new HTTPS connection (1): www.google.com:443
01/30/2024 08:31:45 AM https://www.google.com:443 "GET /search?q=nhl%2Bbowen%2Bbyram&num=12&hl=en&start=0 HTTP/1.1" 200 None
mazuzic-conga commented 4 months ago

I'm facing the same problem. Did you found a workaround?

mumba17 commented 3 months ago

Having the same problem

alluding commented 1 month ago

Is this still broke?

Nv7-GitHub commented 1 month ago

Perhaps it could handle the 200 NONE error and prevent future searches. I will try implementing this