Open mariankh1 opened 1 month ago
Could you please specify what type of requests the API will process? Also, what do you mean by 'role' and 'token'? Could you explain these terms further?
Hello, so at the moment In the android app , I calling the https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407 model from huggingface to summarise the emails and create actions. Please check it out on how it works.
I make a request there and pass the "Token" is the hugging face API token, the "role" (user) and "content" (emails) .
It would be great, that instead of sending it there, to send it to a service that runs the model - which we can then host on gradio or on a server.
In this way, we will not be restricted to the number of requests we can make in an hour.
Let me know what you think and if you need further clarifications.
On Tue, 24 Sept 2024 at 21:15, pixelpulserahul @.***> wrote:
Could you please specify what type of requests the API will process? Also, what do you mean by 'role' and 'token'? Could you explain these terms further?
— Reply to this email directly, view it on GitHub https://github.com/MailyDaily/MailyDailyAndroid/issues/10#issuecomment-2371985180, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKV265SH5YKY5AXO7WSQI7LZYGT2LAVCNFSM6AAAAABOWRD3QGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZRHE4DKMJYGA . You are receiving this because you authored the thread.Message ID: @.***>
What is gradio. And what is this used for? and can you please tell me what kind of actions are they going to be.
Hello, I was thinking about https://www.gradio.app but truly all we need is just a service that hosts the model and an API. Let me try to explain. We currently use the Mistral Instruct model from hugging face through the API inference. This is hosted on hugging face, and we access it through an access token. It would be ideal to host it in our server and access it through a similar API. The best would be to have it in a docker so that we can setup it quickly on any server.
Mistral in their website suggest vLLM as an ideal open source inference and serving engine. They say that is particularly appropriate as a target platform for self-deploying Mistral models on-premise https://docs.mistral.ai/deployment/self-deployment/vllm / .
Can you have a look and let me know what you think?
I am trying to understanding how things are working. i've done it with 50%. I am just looking how to host models on gradio. Did you know how can we host the model or any tutorial. I am adding langchain have you heard of it?
As far as I have gathered, Gardio itself makes API calls to Huggingface. We make API calls to Gardio and Gardio calls Huggingface. Is that right?
Create an API service that can be called to process the requests from the app. We can then host this into a server.
The API shall accept the role and the token.
Instructions for deploying the serverless version are provided here. https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407