assafelovic / gpt-researcher

GPT based autonomous agent that does online comprehensive research on any given topic
https://gptr.dev
MIT License
13.06k stars 1.61k forks source link

Poor selected sources #496

Closed guidorietbroek closed 1 month ago

guidorietbroek commented 1 month ago

Thanks for this nice idea! For our application we did some testing and we found out that, in our opinion, the results that are being used are not always very trustworthy. We even get sources back where we were unable to open the webpage.

I am curious how the underlying technique is working in selecting the sources and how it could be improved. How do you determine if a source is trustworthy? What makes a source valuable?

See these results which are, in my opinion, not really interesting for using:

CONTENT: Met Vodafone Red Together bel je maandelijks 100 minuten gratis* naar vaste en mobiele telefoonnummers Turkije. Wanneer je in Turkije bent, kun je je bundel zorgeloos gebruiken alsof je in Nederland bent. Bellen, sms'en en data is hetzelfde als je bundel in Nederland. * exclusief betaalde nummers.
SOURCE: http://vodafoneturkije.nl/
CONTENT: Conclusie. Internet en telefonie in Turkije zijn goed ontwikkeld en bieden reizigers verschillende manieren om verbonden te blijven met de wereld. Met een verscheidenheid aan mobiele providers, WiFi-hotspots en roamingopties heeft u verschillende manieren om online te blijven en te bellen.
SOURCE: https://tuerkeilife.de/nl/internet-telefon-tuerkei/
assafelovic commented 1 month ago

Hey @guidorietbroek the sources are selected by the search retriever. You can try out additional retrievers such as Bing, Google, etc by modifying the config.py file. More on that here: https://docs.tavily.com/docs/gpt-researcher/config