arrrlo / Google-Images-Search

[PYTHON] Search for image using Google Custom Search API and resize & crop afterwards
MIT License
177 stars 34 forks source link

Problem while downloading more than 150 images #138

Closed losredoe132 closed 2 years ago

losredoe132 commented 2 years ago

Hi, thank you very much for your implementation. Everything (expacially next_page) works fine for ~150 images but than a <HttpError 400 when requesting https://customsearch.googleapis.com/customsearch/v1?cx=3fbe767c4e8c26f73&q=industrial+robots&searchType=image&num=1&start=212&safe=off&key=AIzaSyAM2Z62TrFvrhAs0I_y5vV0urzX6o78IR4&alt=json returned "Request contains an invalid argument.". Details: "Request contains an invalid argument."> occurs.

Did anybody observed a similiar behaviour? Below is my code:

import logging
import json
import time
from google_images_search import GoogleImagesSearch
from tqdm import tqdm

def download_google_images(keyword: str,
                           n_images: int,
                           path_dir: str,
                           download: bool = True):
    with open('secrets/api_creds.json') as fh:
        api_creds = json.load(fh)
    gis = GoogleImagesSearch(api_creds['api_key'],
                             api_creds['search_engine_key'])

    path: str = os.path.join(path_dir, keyword)
    if not os.path.exists(path):
        os.makedirs(path)
        logging.info(f'folder {path} created')

    logging.info(f'images will be saved in {path}')

    _search_params = {
        'q': keyword,
        'num': 1
    }

    gis.search(search_params=_search_params)

    path_img_url_archive: str = 'logs/images_url_archive.txt'

    # for i in tqdm(range(int(n_images/_search_params['num']))):
    for i in tqdm(range(n_images)):
        logging.info(f'iteration {i}')
        try:
            gis.next_page()
        except Exception as e:
            logging.error(e)

        for image in gis.results():
            image_url = image.url
            logging.info(image_url)  # image direct url

            if download:
                # download image
                try:
                    image.download(path)
                except Exception as e:
                    logging.error(e)
                else:
                    image.resize(512, 512)  # resize downloaded image
                    with open(path_img_url_archive, 'a') as file:
                        file.write(f'{image_url} \n')

if __name__ == '__main__':
    download_google_images('industrial robots',
                           250,
                           'output_dir/diy-cobot-7')

That does not work either:


from google_images_search import GoogleImagesSearch

_search_params = {
    'q': 'cobot',
    'num': 1000,
    # 'imgType': 'clipart|face|lineart|stock|photo|animated|imgTypeUndefined',
}
with open('secrets/api_creds.json') as fh:
    api_creds = json.load(fh)
gis = GoogleImagesSearch(api_creds['api_key'],
                         api_creds['search_engine_key'])

# this will search, download and resize:
gis.search(search_params=_search_params,
           path_to_dir='output_dir/diy-cobot-10', width=512, height=512)```
losredoe132 commented 2 years ago

So i found out, that there is an upper limit for using the next_page endpoint. "This role is not present if the current results are the last page. Note: This API returns up to the first 100 results only." (https://developers.google.com/custom-search/v1/using_rest)

"Note: The JSON API will never return more than 100 results, even if more than 100 documents match the query, so setting the sum of start + num to a number greater than 100 will produce an error. Also note that the maximum value for num is 10." (https://developers.google.com/custom-search/v1/reference/rest/v1/cse/list)

Does anybody know a way to get more image than 100 images to a query?

arrrlo commented 2 years ago

Hi,

The lib knows about that and queries 10 by 10 images, with offset each time, to get to the desired number. You can see in your error message that some of the params in the api query is num=1&start=212 - fetch one image with offset of 212.

So I guess it's not about the number of images.

Do you recognise any other patterns within this issue?

arrrlo commented 2 years ago

Looks like this is the issue: https://github.com/arrrlo/Google-Images-Search/issues/141