farrael004 / Quest

This is a web app that integrates GPT-3 with google searches
Apache License 2.0
74 stars 14 forks source link

"Infinite searches" (Or at least very long ones) #1

Open farrael004 opened 1 year ago

farrael004 commented 1 year ago

When searching for the following question in Google search: "what is the rent price in orange county, CA?" It may take a significant amount of time scraping websites.

ghost commented 1 year ago

I get no results back. Have you changed the code? On which platform are you using it (Windows Mac Linux)?

farrael004 commented 1 year ago

I didn't change anything for a while. I use windows, but that's probably not the issue. The hosted version runs on linux and it works fine. Usually, the problem would be your internet connection, or because you are using a VPN. What is the problem you are seeing exactly? Is it that the searching internet part takes too long? Or it shows the "No results" message?

ghost commented 1 year ago

No result message. I do not use a vpn

ghost commented 1 year ago

May try it on a another machine. Google Cloud Console

farrael004 commented 1 year ago

No result message. I do not use a vpn

This error happens in the internet_search.py script. It is triggered if the find_links_from_search() function does not find any links in the google query page. You can try saving the 'soup' variable on line 37 as a HTML file. Then open the file and see if it looks like a normal google search page. I suspect that your error is because this page is somehow not what it's supposed to be.

ghost commented 1 year ago

I thought, that when you open google the first time, you need to accept the policies. Maybe that's the problem. I have a searxng running, have tried it what that, same problem "no results".

farrael004 commented 1 year ago

Yes. You would need to accept google's policies. Alternatively, you can change the google_search() function so it returns results from a different source, like searxng. Make sure it returns a pandas dataframe with 4 columns: 'text', 'link', 'text_length', and 'ada_search'.

The 'text' column should contain snippets of text from a website with at most 1000 characters The 'link' column is the source of this text snippet.

The last two columns can be created like this:

search_results['text_length'] = search_results['text'].str.len()
search_results['ada_search'] = search_results['text'].apply(lambda x: create_embedding(x))

There's a DuckDuckGo API feature called Instant Answers that gives you text snippets from the internet with sources that you could use to substitute here. If you plan on searching for whole websites, you can extract the text in them with the extract_useful_text() function.