Closed Technologicat closed 8 months ago
I have a web crawler stashed for the next release that could access the links returned by web search. It's not exactly a "read that URL" kind of thing because that could lead to some targeted injection or privacy breaching attacks.
Ok, that's interesting. Feeding in material from the internet would be useful.
However that's a bit different from what I meant - I'd like to be able to ask e.g. where a particular piece of documentation is available on the web, and get the AI to give me a clickable URL that I can then open in a web browser.
That part with "get the AI to give me a clickable URL" is prone to hallucinations, especially with small models. It can give you 1. non-working 2. outdated 3. just wrong links.
Yes, definitely, that's what happens when the LLM is tasked to generate URLs. Being essentially a fancy autocomplete machine, the model will just make up something that plausibly sounds like it came from its training distribution.
My intuition here was to avoid the "generate". Having the actual correct link injected into the prompt (from the web search) should make the model less likely to hallucinate, since this transforms the task into rephrasing information that is already available in the context.
It's a fair point that LLMs are still rather unreliable. And I haven't tested the success rate for this approach. To think of it, I could rather easily run a bunch of tests by hand-crafting the raw prompt in ooba's notebook mode. Perhaps I should do that.
I have to say that since Mistral became a thing, 7Bs have come a surprisingly long way during the last few months, but it may be that my expectations for them are nevertheless a tad optimistic. :)
I see you've added this too - the links=on
option does what I intended, as well as includes text from the linked page into the search results, which is probably a better solution than just a bare link.
Thanks! I'll experiment around with this.
Implemented, so closing the ticket.
Sorry for posting a lot in a short time, but there's one more idea that came up in my initial testing:
Currently,
websearch
only injects the text, and discards the URLs where the matches came from. Sometimes, it would be useful to have the URLs available in the prompt - for example, when querying for the URL of some particular piece of open source documentation. Sure, I could use a search engine the traditional way instead, but it would be nice for this shiny new technology to support that use case, too.I quickly looked through the source code (
SillyTavern-extras/modules/websearch/script.py
,SillyTavern-extras/server.py
, andSillyTavern/public/scripts/extensions/third-party/Extension-WebSearch/index.js
), and I can understand why it's like that. It seems nontrivial to extract the links together with the relevant surrounding text, at least by the CSS filtering approach that is currently used.Still, maybe something to consider later.