SillyTavern / SillyTavern

LLM Frontend for Power Users.
https://sillytavern.app
GNU Affero General Public License v3.0
13.77k stars 3.14k forks source link

[FEATURE_REQUEST] Support Claude Prompt Caching in Chat Completion Custom Source #3689

Closed tish1781 closed 3 weeks ago

tish1781 commented 1 month ago

Have you searched for similar requests?

Yes

Is your feature request related to a problem? If so, please describe.

Anthropic system prompt caching is supported in chat completion claude source, and caching at depth is supported in both Claude and Openrouter sources.

However, none of these features are supported in the custom source, which brings a problem: Users might use the custom source for Openrouter Claude so that they can apply the 'strict' mode post-processing, but now they cannot cache the prompt.

Describe the solution you'd like

Apply the implementations of both system prompt caching and caching at depth to chat completion custom source.

Describe alternatives you've considered

Provide post-processing options for chat completion openrouter source.

Additional context

No response

Priority

Medium (Would be very useful)

Are you willing to test this on staging/unstable branch if this is implemented?

Yes

Cohee1207 commented 1 month ago

The custom source is meant to be "OpenAI-compatible". Adding this or any other payload modification would make it explicitly incompatible with OpenAI.

See #3511

Cohee1207 commented 1 month ago

However, none of these features are supported in the custom source, which brings a problem: Users might use the custom source for Openrouter Claude so that they can apply the 'strict' mode post-processing, but now they cannot cache the prompt.

I brought this up with OpenRouter. They said they'll reconsider how they handle system messages in the middle of prompt, but I can't provide any more information. This is totally an OpenRouter problem processing, not a frontend problem. A frontend or its users would not expect the messages to be hoisted from their positions in the prompt.

cloak1505 commented 1 month ago

Attaching the Prompt Post-Processing drop-down directly to OpenRouter in API tab would be very cool. @tish1781 FYI, Claude doesn't require first message as user, and OR inserts a placeholder user message (one of the purposes of Strict) when necessary. If a model errors because of first message, they patch it quickly when reported.

For "simplicity", we only need None (existing; default I guess) and Semi-strict.

Cohee1207 commented 1 month ago

@cloak1505 Try this: https://github.com/SillyTavern/SillyTavern/pull/3721

cloak1505 commented 1 month ago

Ayaaa prompt caching isn't being applied with non-None mode. But yes, semi-strict solves the whole system role thing.

zshallow commented 1 month ago

The function that adds the caching markers does no non-trivial request rewriting liable to break anything else so it's probably fine to just call it last.

EDIT: technically it does the s => {"type": "text", "text": string} thing but that's guaranteed to erase no metadata.

Cohee1207 commented 1 month ago

Try semi-strict post processing on Chat Completion OpenRouter in staging.

github-actions[bot] commented 3 weeks ago

🔄 An alternative solution has been provided for this issue. Did this solve your problem? If so, we'll go ahead and close it. If you still need help, drop a comment within the next 7 days to keep this open.