awslabs / LISA

LLM inference solution for Amazon Dedicated Cloud (LISA).
Apache License 2.0
34 stars 6 forks source link

not able to interact with deployed embeddings model #59

Closed david-saeger closed 2 months ago

david-saeger commented 2 months ago

After successfully deploying the two models in the example_config mistralai/Mistral-7B-Instruct-v0.2 and intfloat/e5-large-v2 I am attempting to interact witht the embedding model.. initially just through the example jupiter notebook here https://github.com/awslabs/LISA/blob/develop/lisa-sdk/LISA_v2_demo.ipynb.

For some reason I can not seem to interact with the embedding model. In cell 2 of the notebook after listing models through the openai client only the minstral model prints

[Model(id='mistral7b', created=1677610602, object='model', owned_by='openai')]

I am able to chat with that model.

When attempting to create embeddings with the embeddings model I get

CreateEmbeddingResponse(data=None, model=None, object=None, usage=None, error={'message': '{"error": "embeddings: Invalid model name passed in model=e5v2"}', 'type': 'None', 'param': 'None', 'code': '400'})

As far as I can tell the models deployed successfully. For instance registeredModels in parameter store prints thusly:

[{"provider":"ecs.textgen.tgi","modelId":"mistral7b","modelName":"mistralai/Mistral-7B-Instruct-v0.2","modelType":"textgen","endpointUrl":"http://internal-***-mistral7b-*****.us-gov-west-1.elb.amazonaws.com","streaming":true},{"provider":"ecs.embedding.tei","modelId":"e5v2","modelName":"intfloat/e5-large-v2","modelType":"embedding","endpointUrl":"http://internal-***-e5v2-*****.us-gov-west-1.elb.amazonaws.com"}]
petermuller commented 2 months ago

Hi there! We don't have any logic to differentiate the embedding and textgen models at the OpenAI API level, so everything should show there. It seems like the model was not registered with LiteLLM under the hood if that's the case. The logic here goes into more detail: https://github.com/awslabs/LISA/blob/develop/lib/serve/rest-api/src/utils/generate_litellm_config.py#L37-L51

So based on your param details, things look like they should have worked.

I imagine there isn't going to be a difference in the list you received and a list from a direct API call, but do you see additional models if you query the /models resource directly? I wouldn't have expected the LiteLLM configuration to work if the embedding model failed or if the config file failed to create.

curl -k -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer <your token>' \
    https://your-loadbalancer.elb.amazonaws.com/v2/serve/models
petermuller commented 2 months ago

I was able to replicate the issue- did you happen to add the embedding model after an initial successful deployment? If yes, and the REST API service was left unchanged, then we didn't re-generate the LiteLLM config at all.

The LiteLLM config gets generated at instance startup so a workaround for right now is to just terminate the REST API EC2 instance and wait for autoscaling to automatically bring a new one up. The new instance will have your config and should output your embedding model in addition to the text generation model.

I'll leave this issue open since this is still a bug, but hopefully this should get your model recognized in the API!

david-saeger commented 2 months ago

I did employ the embedding model after initial deployment. Thought I deployed the api infra stack after deploying the new model but at any rate you absolutely correct that standing up a new server fixed the issue. Thanks!