API key passthrough missing for custom OpenAI-compatible models

dannaf commented 1 day ago

Bug Description:

Theia 1.54.0 claims to support custom OpenAI-compatible models including "in the cloud" but it really only supports non-api-key-protected models, as it does not seem to pass through a custom-model API key environment variable from the AI features settings. So there seems to be no straightforward/non-hacky way to connect an API key requiring cloud-hosted OpenAI-compatible AI model (e.g. Perplexity Pro API).

And there is not even a straightforwardly-hacky way of doing it, without customizing theia code (e.g. by utilizing the official OpenAI API environment variable for non-official/OpenAI-compatible APIs), as the current implementation outright drops the api key for a hardcoded 'no-key' string when calling a custom model; here: https://github.com/eclipse-theia/theia/blob/c7fb4f525a3e4ededf600fef1ca71cd9fdaaeca0/packages/ai-openai/src/node/openai-language-model.ts#L188

Steps to Reproduce:

Enable AI features in 1.54.0
Configure a custom OpenAI-compatible model in the settings.json — try to add an API key for this custom model in "ai-features.openAiCustom.customOpenAiModels" but...
Observe that the custom model will not work if it requires an API key

Additional Information

Operating System:
Theia Version: 1.54

JonasHelming commented 10 hours ago

@sdirix

sdirix commented 9 hours ago

The intention was to avoid accidental key leakage of Open AI keys to non-Open AI parties.

Suggestion:

We allow the user to specify an own key for each custom model
If there is no custom key, but the OPENAI_API_KEY environment variable is set, we will use it instead of stripping it on purpose because this is the default behavior of the official OpenAI API

@dannaf Does this work for you?

Workaround for adopters:

If you're working on a Theia-based application, you can override the current behavior by rebinding the OpenAiLanguageModelsManager with an own subclass. In it you register your own subclass of OpenAiModel in which you overwrite the initializeOpenAi function.

dannaf commented 7 hours ago

The first point sounds great, I think it's exactly what should be done: that for each custom model there could be set by the user an (optional) apiKey environment/settings variable. (It would be optional for situations where it is not needed, like a model running locally or whatever you had in mind at the initial implementation.)

Avoiding accidental leakage of the OpenAI api key to non-OpenAI models is also correct. I don't think it's an issue at all to require the user of a custom model that requires an api key to explicitly indicate it via an environment/settings variable. There is no need to allow the official OpenAI to be used for custom models when there is an intentionally implemented custom api key specification method via an env/settings variable; I had only suggested that as a temporary workaround, which also was not available, just to further motivate implementing a solution. I don't think it needs to be a feature of an intentional implementation, but if you think it could be a convenience (I am not sure it would be, as the api key for the custom model isbpresumably different) then maybe there should be something like an allowOpenAiKeyForCustomModelUse variable that the user would need to 'opt in' to as part of the official OpenAi settings, and that would enable the user to conviently/quickly set another useOfficialOpenAiKey=true variable in the custom model's settings. This way it would be secure against custom leakage, but if a custom model actually uses the official OpenAI api key it could quickly and easily be set via the iseOfficialOpenAiKey variable instead of having to retype it (assuming the user has set the allowOpenAiKeyForCustomModelUse setting to true). This may also improve the security of such a situation as it helps the user to avoid quickly pasting in plaintext the repeated official OpenAi key into the settings.json of the custom model, which could defeat the security of having loaded the official key in an environment variable if that was done. This way the environment variable would need to be set once and could be easily used multiple times via the settings.

What exactly do you mean is the default OpenAI behavior that you mentioned? Can you link docs to this?

sdirix commented 5 hours ago

The OPENAI_API_KEY environment variable is a concept of the official client library, see here. So when someone uses the official client library and does not set a key, then automatically the environment variable is used.

I like the idea of an additional useOpenAiKey setting for custom models to make this fallback explicitly available for users.

eclipse-theia / theia