NexaAI / nexa-sdk

Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.
https://docs.nexa.ai/
Apache License 2.0
4.46k stars 658 forks source link

[FEATURE] <title> run_type "Multimodal" missing in nexa_service.py #136

Closed tigerrabbit closed 1 month ago

tigerrabbit commented 1 month ago

Issue Description

when running server llava1.6-mistral - unable to load model nexa server llava1.6-vicuna objc[12203]: Class GGMLMetalClass is implemented in both /Applications/Nexa.app/Contents/Frameworks/libggml_llama.dylib (0x12083c260) and /Applications/Nexa.app/Contents/Frameworks/nexa/gguf/lib/libstable-diffusion.dylib (0x1230a0260). One of the two will be used. Which one is undefined. INFO: Started server process [12203] INFO: Waiting for application startup. 2024-10-02 12:55:30,162 - INFO - Model Path: llava1.6-vicuna Model llava-v1.6-vicuna-7b:model-q4_0 already exists at /Users/gg/.cache/nexa/hub/official/llava-v1.6-vicuna-7b/model-q4_0.gguf ERROR: Traceback (most recent call last): File "starlette/routing.py", line 693, in lifespan File "starlette/routing.py", line 569, in aenter File "starlette/routing.py", line 670, in startup File "nexa/gguf/server/nexa_service.py", line 343, in startup_event File "nexa/gguf/server/nexa_service.py", line 203, in load_model ValueError: Model llava1.6-vicuna not found in Model Hub

ERROR: Application startup failed. Exiting.

Steps to Reproduce

nexaai server llava1.6-mistral and compare to nexaai server llava1.6-mistral

OS

mac os latest

Python Version

3.12.6

Nexa SDK Version

latest

GPU (if using one)

mac metal

Davidqian123 commented 1 month ago

For now, we don't support multimodal models server, you can try nexa run llava1.6-mistral. If you want to host local server for multimodal models, we will improve our sdk to support multimodal models server in the future.

JGalego commented 1 month ago

+1 for multimodal models server support

tigerrabbit commented 1 month ago

I am in love with the product. I was impressed with the coding when I dug into this using the editable install option. I am going to be keeping up with this and look forward to this option. I am sure there are some gotchas that I am not even aware of in implementing this feature.

Davidqian123 commented 1 month ago

we will include this in our roadmap

zhiyuan8 commented 1 month ago

+1 for multimodal models server support

Thank you, we are priotizing this and it expects to be delivered in next 2 weeks.

zhycheng614 commented 1 month ago

@tigerrabbit @JGalego Hi,

I'm glad to announce that your feature request has just been fulfilled in #154 . Now you can try the VLM api with route /chat/completions. Note that a VLM model should be loaded (like llava1.6-vicuna) and the request body should be well formatted for multimodal input.

To get more details, you can refer to our CLI docs here: CLI

Keep yourself updated to our release in the near future, when this update will be available through pip install.

Thank you,

Perry Nexa AI

zhycheng614 commented 1 month ago

https://github.com/NexaAI/nexa-sdk/releases/tag/v0.0.8.7