ajbouh / substrate

7 stars 3 forks source link

bridge: command reflection #202

Open mgood opened 1 month ago

mgood commented 1 month ago

There are a few aspects of commands to continue to build on for bridge. These may make sense to split out into separate issues once they're more firmed up, but combining the discussion here for now.

Reflection from other pages

Bridge should maintain a set of "command sources" (URLs) relevant to a session. It should poll these with the REFLECT command to detect relevant commands, and include those for command completion. This would probably be exposed with the JS command API for adding & removing URLs in the list of sources. Trigger words: for the initial bridge commands it checks for a keyword (e.g. "assistant") before checking for a command completion. I guess we test it with no keyword and see how the performance is, as well as false positive rate.

Command UI

We've talked about a general UI for inspecting and calling commands within the browser. This could be a JS library that would be included on the page and provide a UI based on the commands detected. On the JS side we currently have the substrate.r0 object for detection. Would we expect this to contain all commands relevant to the page? Should we expect substrate.r0 to also call REFLECT on this page to add in any server-side commands?

Bridge expose REFLECT

Calling REFLECT on bridge should return relevant commands. Though if we're using this in conjunction with Chromestage, I think the flow would look something like:

@ajbouh does this sound right? From what you mentioned about listing commands, were you thinking more of the UI side, or on the bridge backend?

ajbouh commented 1 month ago

There are a few aspects of commands to continue to build on for bridge. These may make sense to split out into separate issues once they're more firmed up, but combining the discussion here for now.

Reflection from other pages

Bridge should maintain a set of "command sources" (URLs) relevant to a session. It should poll these with the REFLECT command to detect relevant commands, and include those for command completion. This would probably be exposed with the JS command API for adding & removing URLs in the list of sources. Trigger words: for the initial bridge commands it checks for a keyword (e.g. "assistant") before checking for a command completion. I guess we test it with no keyword and see how the performance is, as well as false positive rate.

My approach here has been to pull a command set only when it's needed. We can have bridge do the same. It can fetch the set of commands for the current command source URLs on demand. This should reduce some of our complexity. The UI can poll this and the assistant can fetch it right before it attempts to invoke a command. This reduces the chance that the assistant falls behind the right set of commands because the polling rate is too slow.

Command UI

We've talked about a general UI for inspecting and calling commands within the browser. This could be a JS library that would be included on the page and provide a UI based on the commands detected. On the JS side we currently have the substrate.r0 object for detection. Would we expect this to contain all commands relevant to the page? Should we expect substrate.r0 to also call REFLECT on this page to add in any server-side commands?

I think we should write some logic for populating substrate.r0 based on a given set of URLs. This would parallel what bridge would do internally (though here in JS instead of golang). I don't know what we'll want the automatic behavior to be, but I believe we'll want to have this as a JS primitive

Bridge expose REFLECT

Calling REFLECT on bridge should return relevant commands. Though if we're using this in conjunction with Chromestage, I think the flow would look something like:

  • bridge gets user input
  • bridge calls REFLECT on Chromestage to get commands relevant to the page loaded in Chromestage
  • bridge calls Chromestage which forwards the RUN request to the current page If bridge receives a request relevant to itself (e.g. "add assistant") then I think it would just handle it directly. However, maybe this would be used by the command UI to detect the commands available within the bridge session.

I think bridge can implement its own notion of commands based on a set of URLs and golang functions. Any active chromestage(s) would implicitly be included in that set. I think that might be enough to get the whole thing working. I wrote some of this functionality for chromestage already, so we might be able to reuse those abstractions if they make sense here.

mgood commented 1 month ago

Summarizing a few things from the call:

Keywords: can have it still responsd to "bridge" keyword, though need to make sure we're de-duping on outputs if we get an assistant "completion" response and a "tool" response for the same input.

UI - just listing commands right now is helpful even if it can't call yet. Maybe even just console logging the commands for debugging.

REFLECT: may be helpful to extend this to have a list of extra URLs to poll instead of recursively polling in bridge, chromestage, or other interfaces that are proxying on top of additional REFLECT-capable resources. E.g. "See-Also:"

mgood commented 2 weeks ago

@ajbouh I just re-read this and I don't think anything has changed substantially regarding the implementation. I guess the one thing that I might look at is how the tool-calling behaves if we pass in the REFLECT schema more directly. Right now the structures for function descriptions are pretty similar, but some minor differences based on the examples I was following for the tool-calling inputs. However, we may be able to simplify the translation between them if the model seems to still give decent results if we pass in the schema based on the REFLECT definitions instead.