Closed cwang closed 1 year ago
Decide on env var naming convention for models
Env vars' naming conventions for initialising models:
DOCQ_{VENDOR}_{USECASE}_{OBJECT}
in all uppercase, for instance DOCQ_OPENAI_API_KEY
.For 3rd-party model vendors, available {VENDOR}
values include a single vendor string such :
For cloud vendors, these {VENDOR}
include two parts, {CLOUDVENDOR}
and {MODELVENDOR}
separated by a _
such as:
So for azure Open AI it will be the following, make sense?
DOCQ_AZURE_OPENAI_API_KEY1 DOCQ_AZURE_OPENAI_API_KEY2 DOCQ_AZURE_OPENAI_API_BASE DOCQ_AZURE_OPENAI_CHAT_DEPLOYMENT_NAME or DOCQ_AZURE_OPENAI_CHAT_DEPLOYMENTNAME? DOCQ_AZURE_OPENAI_CHAT_MODEL_NAME DOCQ_AZURE_OPENAI_TEXT_DEPLOYMENT_NAME DOCQ_AZURE_OPENAI_TEXT_MODEL_NAME
Probably a good idea to have the name of the model deployed passed through explicitly.
LangChain also have the following as env vars but I don't think it's needed because neither have an explicit implication on infra resources. So will not pass in.
DOCQ_AZURE_OPENAI_API_TYPE DOCQ_AZURE_OPENAI_API_VERSION
DOCQ_AZURE_OPENAI_API_KEY1 DOCQ_AZURE_OPENAI_API_KEY2 DOCQ_AZURE_OPENAI_API_BASE DOCQ_AZURE_OPENAI_API_VERSION
all good above
Re model name, no need to supply - the same way that with OpenAI, we're select an actual model on the fly within the application code. Re deployment name, it depends on the relationship between models and deployments - if it's 1:1 then there's no need but we need a separate convention internally in the application code to assume same (or similar) model/deployment name. If it's 1 deployment with N models and then yes for deployment name however it should be
DOCQ_AZURE_OPENAI_DEPLOYMENT_NAME
Happy to chat to clarify
Re model name - a deployment is tied to one model. The deployment name (the azure resource name) can be anything but at the moment I'm setting the deployment name = model name. This will work fine unless we need to deploy two instances of the same model like a dev a prod or needing to partition for some other reason. I don't know what these use cases could be.
Re deployment name if we go with DOCQ_AZURE_OPENAI_DEPLOYMENT_NAME are we not going to support multiple models at he same time? I understood that we needed this.
So have gpt-35-turbo and text-davinci-003 models available in parallel so app code can use both?
Also why include DOCQ_AZURE_OPENAI_API_VERSION when there's not infra dependency. From what I can tell you can switch to older OpenAI API version client side with no changes on the infra.
This is separate from model version to be clear. those are set as infra config
Also why include DOCQ_AZURE_OPENAI_API_VERSION when there's not infra dependency. From what I can tell you can switch to older OpenAI API version client side with no changes on the infra.
If this is not attached to the OpenAI service or deployment during creation time, then we can lose it.
If it's 1:1 between deployment and an actual model, then my view is don't set it just to be consistent and let application code to define which model (which in this case implies the deployment name) to use.
@cwang I think you missed answering my second question.
Are we going to support two or more models at the same time from a single provider? I understood that we needed this.
Example: have 'gpt-35-turbo' and 'text-davinci-003' models available in parallel from Azure OpenAI so app functionality can be built that use both?
Yes because it's 1:N between api key and deployments I believe?
OK then we have two options (at least for Azure)
DOCQ_AZURE_OPENAI_TEXT_DEPLOYMENT_NAME
and DOCQ_AZURE_OPENAI_CHAT_DEPLOYMENT_NAME
. Model name is implied but can use the model name to set the deployment name. This limits us to only having one deployment per model (probably fine).DOCQ_AZURE_OPENAI_DEPLOYMENT_NAMES = [{'dname':'textdavinchi-dep1', 'mname':'text-davinci-003'}, {'dname':'gpt35turbo-dep1', 'mname':'gpt-35-turbo'}]
dname
* - deployment name
mname
- model name
*keeping names short to save on chars. PaaS's like Azure App Service pass in a load of env vars, and the char limit is shared.
deployment name env var per model (each deployment is a single model, 1:1) some thing like DOCQ_AZURE_OPENAI_TEXT_DEPLOYMENT_NAME and DOCQ_AZURE_OPENAI_CHAT_DEPLOYMENT_NAME. Model name is implied but can use the model name to set the deployment name. This limits us to only having one deployment per model (probably fine).
The answer to this is my previous question about the relationship between a deployment and models belonging to it. Is it 1:1 or 1:N? Can we find out with a definitive answer? Also I believe API keys are at OpenAI service level, therefore it's 1:N between keys and deployments. Can we also confirm that?
I don't think we should worry about deployment name and models - as I suggested earlier, using naming conventions in application code to handle it, e.g. with the assumption that 1 deployment contains 1 model, both named identically. Think about how we provision OpenAI (3rd-party one), again we don't specify actual models in env vars.
In short, env vars should be considered just enough to get everything started in this case. The relaxed approach will leave application (system settings) to dictate what model(s) to be initiated.
deployment:model is 1:1.
~Per the PR #34 , chosen option number 1 from above.~
Change to using the deployement name = model name convention. Therefore deployment name will not be passed in an env var. Only the following three are set for Azure.
DOCQ_AZURE_OPENAI_API_KEY1
DOCQ_AZURE_OPENAI_API_KEY2
DOCQ_AZURE_OPENAI_API_BASE
Client works across HF (free) Inference API or self-hosted Inference Endpoints.
class huggingface_hub.InferenceClient( model: typing.Optional[str] = Nonetoken: typing.Optional[str] = Nonetimeout: typing.Optional[float] = None )
Parameters:
model
(str, optional) — The model to run inference with. Can be a model id hosted on the Hugging Face Hub, e.g. bigcode/starcoder or a URL to a deployed Inference Endpoint. Defaults to None, in which case a recommended model is automatically selected for the task.
token
(str, optional) — Hugging Face token. Will default to the locally saved token.
timeout
(float, optional) — The maximum number of seconds to wait for a response from the server. Loading a new model in Inference API can take up to several minutes. Defaults to None, meaning it will loop until the server is available.
Initialize a new Inference Client.
InferenceClient aims to provide a unified experience to perform inference. The client can be used seamlessly with either the (free) Inference API or self-hosted Inference Endpoints.
conversational( text: strgenerated_responses: typing.Optional[typing.List[str]] = Nonepast_user_inputs: typing.Optional[typing.List[str]] = Noneparameters: typing.Union[typing.Dict[str, typing.Any], NoneType] = Nonemodel: typing.Optional[str] = None ) → Dict
Is your feature request related to a problem? Please describe.
At the system level, enable model selection/switch among the settings.
Needs to consider: should we allow user-level model selection? more specifically, what are the reasons for it?
Describe the solution you'd like
Save it to the same table in the database like the rest of the settings.
Describe alternatives you've considered
No alternative.
Additional context N/A