ai-cfia / api-test

This repository contains all the tools for testing and stressing the APIs of the AI Lab projects.
MIT License
1 stars 0 forks source link

Incorporate Public Search Engine Comparison #9

Closed ibrahim-kabir closed 5 months ago

ibrahim-kabir commented 5 months ago

Bing Search API

The Bing search is interesting as it is a search engine that stands out, and because we simply wanted to compare it to another popular search engine, using Bing search is an option.

Cons

However, after examining the pricing, I discovered that only 1,000 transactions are free per month. This equals approximately 33 free requests per day. 2 requests are needed for a single testing (QnA) file.

Tasks

Closes

closes #6 closes #11 closes #10

Alternative considered

Google API

Since Google Search Api limits results by 10 at a time and we have at least 20 files, each needing 100 results for testing, we will need at least 200 requests to obtain all the answers. Google only offers 100 free requests by day. After $5 are charged per 1000 requests, around $1 per test.

num integer: Number of search results to return. Valid values are integers between 1 and 10, inclusive.

Documentation

Library tested

google-api-python-client

Issues consulted

Why Does The Google Search API Disallow More Than 100 Results? How Can I Get More?

Google web scrapping

Web scraping has been attempted as it allows for querying completely free of charge. However, Google has incorporated stringent security measures to limit the number of requests. Since Google displays only 10 results at a time and we have at least 20 files, each needing 100 results for testing, we will need at least 200 Google requests to obtain all the answers. Even with time delays, the 200 requests never succeed, leading to the machine's IP address being blocked for a while. Therefore, we must wait for a period longer than 30 minutes or use proxies or VPNs to work around the issue. Today, web scraping is complex and only feasible on a small scale. If we want to do it on a large scale, we need to use several VPNs and switch between them to make it undetectable.

Libraries tested

abenassi Google-Search-API Nv7-GitHub googlesearch

Issues consulted

Github issue. How to fix python requests module 429 error for google search? Error 429 with simple query on google with requests python

Problem encountered

image

SonOfLope commented 5 months ago

Failing pipeline will be fixed with https://github.com/ai-cfia/github-workflows/pull/110

ibrahim-kabir commented 5 months ago

@RussellJimmies, do you have any more request changes ?

ibrahim-kabir commented 5 months ago

@RussellJimmies your change request is blocking my merge