Extending Hugging Face transformers APIs for Transformer-based models and improve the productivity of inference deployment. With extremely compressed models, the toolkit can greatly improve the inference efficiency on Intel platforms.
Apache License 2.0
0
stars
0
forks
source link
Unprocessable Entity using Neural Chat via OpenAI interface with meta--lama/llama-2-7b-chat-hf #11
Is there a specific version of openai that is aligned with the OpenAI interfaces offered by neuralchat? I am currently testing using the current 1.12.0 but encountering a 422 Unprocessable Entity error.
I saw that meta-llama/Llama-2-7b-chat-hf is a supported model and appears to be small enough to fit into my Intel Data Center Flex 170 XPU.
I can successfully run this model locally with the code outlined in deploy_chatbot_on_xpu.
However, when I attempt to use the OpenAI interface per the instructions at https://github.com/intel/intel-extension-for-transformers/tree/main/intel_extension_for_transformers/neural_chat, the server shows 422 Unprocessable Entity and the client gets an error about a missing value. I am assuming this relates to a mismatch between the OpenAI client and the neural_chat server in terms of the required fields. I have also included the text extracted from the tcpdump below.
Following along from the notebook examples, I have prepared textbot.yaml and server.py as below.
Starting the server
$ grep -v "^#" textbot.yaml | grep -v "^$"
host: 0.0.0.0
port: 8000
model_name_or_path: "meta-llama/Llama-2-7b-chat-hf"
device: "xpu"
tasks_list: ['textchat']
$ cat server.py
#!/usr/bin/env python
import os
import time
import multiprocessing
from intel_extension_for_transformers.neural_chat import NeuralChatServerExecutor
import nest_asyncio
nest_asyncio.apply()
def start_service():
server_executor = NeuralChatServerExecutor()
server_executor(config_file="textbot.yaml", log_file="neuralchat.log")
multiprocessing.Process(target=start_service).start()
$ ./server.py
/home/REDACTED/miniconda3/envs/jupyter2/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
warn(
/home/REDACTED/miniconda3/envs/jupyter2/lib/python3.9/site-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
Loading config settings from the environment...
2024-02-19 14:11:22.837584: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-02-19 14:11:22.841047: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-19 14:11:22.887207: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-19 14:11:22.887246: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-19 14:11:22.888669: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-02-19 14:11:22.896900: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-19 14:11:22.897194: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-02-19 14:11:23.782914: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-02-19 14:11:27,327 - datasets - INFO - PyTorch version 2.1.0a0+cxx11.abi available.
2024-02-19 14:11:27,328 - datasets - INFO - TensorFlow version 2.15.0.post1 available.
Loading model meta-llama/Llama-2-7b-chat-hf
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00, 1.25it/s]
2024-02-19 14:11:31,912 - root - INFO - Model loaded.
INFO: Started server process [2913373]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
Additional logs after starting the TextChatClientExecutor client - successful inference
[2024-02-19 14:32:57,683] [ INFO] - Checking parameters of completion request...
[2024-02-19 14:32:57,683] [ INFO] - Predicting chat completion using prompt 'Tell me about Intel Xeon Scalable Processors.'
[2024-02-19 14:33:07,119] [ INFO] - Chat completion finished.
INFO: 127.0.0.1:60734 - "POST /v1/chat/completions HTTP/1.1" 200 OK
Additional logs after connecting via OpenAI - failing access
Is there a specific version of openai that is aligned with the OpenAI interfaces offered by neuralchat? I am currently testing using the current 1.12.0 but encountering a 422 Unprocessable Entity error.
I saw that meta-llama/Llama-2-7b-chat-hf is a supported model and appears to be small enough to fit into my Intel Data Center Flex 170 XPU.
I can successfully run this model locally with the code outlined in deploy_chatbot_on_xpu.
However, when I attempt to use the OpenAI interface per the instructions at https://github.com/intel/intel-extension-for-transformers/tree/main/intel_extension_for_transformers/neural_chat, the server shows 422 Unprocessable Entity and the client gets an error about a missing value. I am assuming this relates to a mismatch between the OpenAI client and the neural_chat server in terms of the required fields. I have also included the text extracted from the tcpdump below.
Following along from the notebook examples, I have prepared textbot.yaml and server.py as below.
Starting the server
Additional logs after starting the TextChatClientExecutor client - successful inference
Additional logs after connecting via OpenAI - failing access
Open AI Client contents
Aside from the shebang and the modified model string, this should be identical to the content on the webpage.
Text from packet capture of exchange
Thank you!