facebookresearch / ParlAI

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.
https://parl.ai
MIT License
10.49k stars 2.1k forks source link

Question about the configuration of search_server when running SeeKeR #5007

Closed scpaliuhc closed 1 year ago

scpaliuhc commented 1 year ago

Lately, I have been perusing the paper “SeeKeR: An Open source Search-Augmented Language Model” and endeavoring to experiment with the related models on my machine. However, I encountered some difficulties when attempting to use Bing Search API to search for relevant documents.

I followed the example indicated on the official webpage:

parlai i -mf zoo:seeker/seeker_dialogue_3B/model -o gen/seeker_dialogue --search-server <search_server>

and replaced with https://api.bing.microsoft.com/v7.0/search. But it didn't work as the server_response.status_code is 404.

image

Then I added my Key of Bing Search to the request and changed POST to GET: image

However, nothing is passed to the next step although the response contains searched results. image. image

Would anyone be able to provide a successful example of running SeeKeR with Bing Search?

mojtaba-komeili commented 1 year ago

Our agent doesn't work directly with Bing. You need to setup your own server which complies with our API. See this for more information about what our agent expects from the API.

scpaliuhc commented 1 year ago

Thanks for your response. I will try to revise my code to match the API that requires the following three fields: image

By the way, which online/local search engines does SeeKeR work with directly? As mentioned in the paper "SeeKeR: An Open source Search-Augmented Language Model":

Following Komeili et al. (2021), in our experiments (unless stated otherwise) we employ the Bing Web Search API to retrieve documents, and then filter that set of documents by intersecting with Common Crawl (Wenzek et al., 2020), and keep the top 5.

However, the detail is not clear. For example, which types of responses are considered? News, WebPages, or all of them? For the convenience of subsequent researchers and to further expand the influence of this work, is it possible to integrate this functionality into your agent?

mojtaba-komeili commented 1 year ago

You can set up a proxy server that sits between your model and any external search service. It forwards the query to the main server and reformats it content to match that, The choice of content type is completely up to you.

github-actions[bot] commented 1 year ago

This issue has not had activity in 30 days. Please feel free to reopen if you have more issues. You may apply the "never-stale" tag to prevent this from happening.