Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠Amazon SageMaker.
9.79k
stars
6.67k
forks
source link
Deploy this TheBloke/vicuna-13B-v1.5-GGUF model on AWS #4603
Open
ahsan3219 opened 3 months ago
Deploy this TheBloke/vicuna-13B-v1.5-GGUF model on AWS
I want to use this model as an endpoint in my web application in this format:![image](https://github.com/aws/amazon-sagemaker-examples/assets/76880965/cc082c6e-6e03-4993-9377-c4ede41972df)
Chatbot Requirements
Scope: Chatbot (Encoder/Decoder for Text Inference or Conversational)
Input via API (JSON): Chatgpt Style – The template can be see below
The JSON will contain 25 user messages, and the response should be the system response. Please use this guidelines to understand API consumption: https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_runtime_InvokeEndpoint.html
Prompt Template for the system: a. template = ''' You are going to be my education assistant. System:{System} Question:{question}'''
LLM Model Parameters: max_new_tokens=512, temperature=0.7, top_p=0.9
If possible use a AutomodelforCausalLM otherwise train a LLM model.
It will be deployed on AWS Sagemaker using S3 buckets.
The GGUF should be saved on a S3 Bucket.
Chat Buffer should store 25 conversations and create a session ID (No need to send this to the End point).
The quantized model is contained here https://huggingface.co/TheBloke/vicuna-13B-v1.5-GGUF/blob/main/vicuna-13b-v1.5.Q4_K_M.gguf
Use HuggingFace/Langchain when possible.
Deliverables: Jupyter notebook/Code – 2 Hours should be used to set up the model in AWS with the customer.
Provide me with complete source code that I can use in my jupyter notebook on aws to make an endpoint. I need it asap.