Enable context providers to be automatically included in prompts

Patrick-Erichsen commented 1 month ago

Validations

[x] I believe this is a way to improve. I'll try to join the Continue Discord for questions
[x] I'm not able to find an open issue that requests the same enhancement

Problem

https://discord.com/channels/1108621136150929458/1108621136830398496/1265344617554116751

Is there any way to add some additional custom handling into the autocomplete completion provider? Something that could provide additional chunks of context data depending on what was being edited? I want to take some internal company data that would apply to building graphql operations and be able to include that in autocomplete, or even in chat if certain words were used, but without needing to use an annotation. Looking for any ideas.

Solution

No response

mark-bradshaw commented 1 month ago

Hi @Patrick-Erichsen Thanks for opening the issue.

To clarify our need...

PROBLEM: We want to provide custom context to our company developer's questions, but they frequently forget to manually invoke the context provider, don't know when to invoke it, or just feel like it's a hassle to keep invoking it. This leads to lower engagement with our custom context provider and lower quality answering by the LLM.

PROPOSAL: I'd like to option to allow a custom context provider to opt in to looking at every user message, determine it's something I care about, and then pull context items related specifically to the message. The message could be matched against regexes for speed, or even quickly bounced off a small model for an evaluation if timing permits it. Being able to opt in to see every message would allow the context provider to be globally available without needing special invocation. This would remove the hurdle to getting context in front of every developer and raise the overall answer quality.

Ideally the configuration for enabling this would be done at the custom provider level, so that the user has minimal work to do to enable the provider without making any mistakes.

DANGERS: Some possible dangers we'd need to avoid:

Overeager providers might inflate context sizes
Not having access to the chat history might cause a provider to not get invoked
Providers that see every message without being invoked could potentially exfiltrate data

To avoid these dangers we might:

Cap the allowed context size that can be returned (might already be done)
Include a chat history object to context providers so that they can see if they have recently needed to be invoked and might still want to provide additional data (might already be done)
Add a warning when enabling a custom context provider that will see every message.

mark-bradshaw commented 1 month ago

Here's a similar issue: https://github.com/continuedev/continue/issues/1730

sestinj commented 1 month ago

@mark-bradshaw thanks for the full expansion here. All of this sounds reasonable to me.

When you say

Ideally the configuration for enabling this would be done at the custom provider level, so that the user has minimal work to do to enable the provider without making any mistakes.

this leads me to think of a few methods of configuration:

1) (this is already done, it just doesn't support arbitrary context providers quite yet) - a top-level property that takes an array of context provider names

"experimental": {
    "defaultContext": ["activeFile"]
  }

2) a property that exists on every context provider determining whether it should be used by default

  "contextProviders": [
    {
      "name": "url"
    },
    {
      "name": "folder"
    },
    {
      "name": "docs"
    },
    {
      "name": "os",
      "includeByDefault": true
    }
  ],

3) we actually outsource this to a special .prompt file: It can override the template for every single message, and as a part of doing this call a context provider. Perhaps a special filename like default.prompt, you could then just give it the contents {{{ myContextProviderName }}}

My gut is that (3) isn't easy enough for users to set up, and that (2) could be confusing because for something like "docs" you actually need to specify which docs you even want to use. So preference to (1), but would welcome your thoughts

We already do pruning of messages/context providers before they are sent, so this should be handled automatically
This is a good idea—we already do this for slash commands, so a quick PR could accomplish this (implementation would go here ). Let me know if you'd be interested in making a PR here
I wonder if the "Context Used" dropdown would be enough. I'd like to try this first, and if not then we can also make it so that the context provider explicitly shows up as an @ mention whenever a new input box is created. This would also let users delete the default context provider when they wanted

mark-bradshaw commented 1 month ago

Thanks for the feedback @sestinj . My assumption on configuration would've been option #2, but option #1 seems like it would be fine too. Extra points that it's already mostly implemented, so that's handy.

I'd be happy to contribute, but I'm unsure of my availability at this point. It might be a while.

continuedev / continue