VincyZhang / intel-extension-for-transformers

Extending Hugging Face transformers APIs for Transformer-based models and improve the productivity of inference deployment. With extremely compressed models, the toolkit can greatly improve the inference efficiency on Intel platforms.
Apache License 2.0
0 stars 0 forks source link

Device does not exist / is not supported error with neuralchat deploy_chatbot_on_xpu notebook #7

Closed VincyZhang closed 9 months ago

VincyZhang commented 9 months ago

Problem Summary and status of similar tests

I am having trouble getting neuralchat to work with my Intel Data Center Flex 170 GPU. Below is my procedure with the build_chatbot_on_xpu Jupyter notebook with a clean environment. I have tried this procedure multiple times and also attempted to follow different instructions from different sources but have the same outcome each time. When I get to the point of running the inference, I get either “Device does not exist” when I stick with the default device reference xpu or “Device is not supported” if I use xpu:0. I have tried this with several different Python versions, but use 3.9 below.

I have BigDL operational on this XPU and system (in a separate environment and not running during these tests below). I have also successfully used the deploy_chatbot_on_icx notebook (again in a separate environment and not running at the same time) using similar tweaks as outlined below to address missing dependencies in requirements.txt in my environment.

I also tried to get deploy_chatbot_on_xpu working (below I focus on build_chatbot_on_xpu). As long as I bring over the code from deploy_chatbot_on_cpu (to address the error relating to asyncio), I can successfully run the server but again get the error related to Device does not exist with device=’xpu’ and Device is not supported with device=’xpu:0’.

I am hoping to get feedback on what I am doing wrong so that I can operate neural chat and successfully employ the OpenAI APIs.

Installation Procedure

Install Data Center GPU Drivers – per https://dgpu-docs.intel.com/driver/installation.html#ubuntu-install-steps

Prepare clean environment

conda create -n jupyter2 python=3.9
conda activate jupyter2

Install OneAPI for PyTorch 2.1 with apt installer – per https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html


wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
sudo apt update
sudo apt install -y intel-basekit
> Add the following to ~/.bashrc and source ~/.bashrc

Required step for APT or offline installed oneAPI. Configure oneAPI environment variables. Skip this step for pip-installed oneAPI since LD_LIBRARY_PATH has already been configured.

source /opt/intel/oneapi/setvars.sh

Recommended Environment Variables

export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1

> Install Intel Extension for PyTorch – https://intel.github.io/intel-extension-for-pytorch/index.html#installation
> Choose GPU, v2.1.10+xpyu, Linux, pip

sudo apt install -y intel-oneapi-dpcpp-cpp-2024.0 intel-oneapi-mkl-devel=2024.0.0-49656 # nothing is updated since the newest version is already installed from above python -m pip install torch==2.1.0a0 torchvision==0.16.0a0 torchaudio==2.1.0a0 intel-extension-for-pytorch==2.1.10+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/

> Preparation for Sanity Check

source {DPCPPROOT}/env/vars.sh source {MKLROOT}/env/vars.sh

> Since these folders were not explicitly described in the documentation, I assumed it should be the following two commands

source /opt/intel/oneapi/dpcpp-ct/2024.0/env/vars.sh source /opt/intel/oneapi/mkl/2024.0/env/vars.sh

> Run Sanity Check

python -c "import torch; import intel_extension_for_pytorch as ipex; print(torch.version); print(ipex.version); [print(f'[{i}]: {torch.xpu.get_device_properties(i)}') for i in range(torch.xpu.device_count())];"

> Sanity Check Response

2.1.0a0+cxx11.abi 2.1.10+xpu [0]: _DeviceProperties(name='Intel(R) Data Center GPU Flex 170', platform_name='Intel(R) Level-Zero', dev_type='gpu, support_fp64=0, total_memory=13535MB, max_compute_units=512, gpu_eu_count=512)

> Download relevant Notebooks

wget https://raw.githubusercontent.com/intel/intel-extension-for-transformers/main/intel_extension_for_transformers/neural_chat/docs/notebooks/build_chatbot_on_xpu.ipynb wget https://raw.githubusercontent.com/intel/intel-extension-for-transformers/main/intel_extension_for_transformers/neural_chat/docs/notebooks/deploy_chatbot_on_xpu.ipynb

> Install and run Jupyter

pip install jupyter jupyter notebook --ip 0.0.0.0

> Connect to jupyter URL in browser

> Try out build_chatbot on xpu notebook
> Add `conda env list` to confirm proper env is in use and `pip install pickleshare` (since a later step gives a warning about this but probably is not required)
> Skip step on oneapi since it was installed in advance
> setvars.sh shows already run as expected
> Skip step on Install Intel Extensino for Pytorch, etc. from source at it should have been done above
> Add `!pip install pydub pymysql deepface exifread` before Inference: Text Chat since these dependencies are missing
> Inference: Text Chat Response

2024-02-14 09:59:28 [ERROR] neuralchat error: Device does not exist Loading model Intel/neural-chat-7b-v3-1

AttributeError Traceback (most recent call last) Cell In[19], line 5 3 config = PipelineConfig(device='xpu') 4 chatbot = build_chatbot(config) ----> 5 response = chatbot.predict("Tell me about Intel Xeon Scalable Processors.") 6 print(response)

AttributeError: 'NoneType' object has no attribute 'predict'

Inference : Text Chat Response – after changing device=’xpu’ to device=’xpu:0’

2024-02-14 10:01:53 [ERROR] neuralchat error: Device is not supported Loading model Intel/neural-chat-7b-v3-1

AttributeError Traceback (most recent call last) Cell In[23], line 5 3 config = PipelineConfig(device='xpu:0') 4 chatbot = build_chatbot(config) ----> 5 response = chatbot.predict("Tell me about Intel Xeon Scalable Processors.") 6 print(response)

AttributeError: 'NoneType' object has no attribute 'predict'


> Copying the same Text Chat script into xputest.py and running from command-line gets a different error (here with **device=’xpu’**) – why is this a different response than from within Jupyter? I have confirmed setvars.sh has been sourced and that I am in the same jupyter2 environment.

/home/REDACTED/miniconda3/envs/jupyter2/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source? warn( /home/REDACTED/miniconda3/envs/jupyter2/lib/python3.9/site-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning) Loading config settings from the environment... 2024-02-14 10:18:37.549692: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-02-14 10:18:37.553245: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used. 2024-02-14 10:18:37.599354: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-02-14 10:18:37.599393: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-02-14 10:18:37.600837: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-02-14 10:18:37.609277: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used. 2024-02-14 10:18:37.609563: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-02-14 10:18:38.533993: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 2024-02-14 10:18:42,841 - datasets - INFO - PyTorch version 2.1.0a0+cxx11.abi available. 2024-02-14 10:18:42,841 - datasets - INFO - TensorFlow version 2.15.0.post1 available. Loading model Intel/neural-chat-7b-v3-1 Loading checkpoint shards: 100%| 2/2 [00:01<00:00, 1.23it/s] 2024-02-14 10:19:17,805 - root - ERROR - Exception: Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES) -5 (PI_ERROR_OUT_OF_RESOURCES) 2024-02-14 10:19:17 [ERROR] neuralchat error: Generic error Traceback (most recent call last): File "/home/REDACTED/jupyter/./cputest.py", line 7, in response = chatbot.predict("Tell me about Intel Xeon Scalable Processors.") AttributeError: 'NoneType' object has no attribute 'predict'


> Running the same command with **device=’xpu:0’** shows the following:

/home/REDACTED/miniconda3/envs/jupyter2/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source? warn( /home/REDACTED/miniconda3/envs/jupyter2/lib/python3.9/site-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning) Loading config settings from the environment... 2024-02-14 10:20:00.620315: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-02-14 10:20:00.623828: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used. 2024-02-14 10:20:00.671369: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-02-14 10:20:00.671411: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-02-14 10:20:00.672846: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-02-14 10:20:00.681503: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used. 2024-02-14 10:20:00.681998: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-02-14 10:20:01.604245: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 2024-02-14 10:20:05,906 - datasets - INFO - PyTorch version 2.1.0a0+cxx11.abi available. 2024-02-14 10:20:05,906 - datasets - INFO - TensorFlow version 2.15.0.post1 available. Loading model Intel/neural-chat-7b-v3-1 2024-02-14 10:20:06 [ERROR] neuralchat error: Device is not supported Traceback (most recent call last): File "/home/REDACTED/jupyter/./cputest.py", line 7, in response = chatbot.predict("Tell me about Intel Xeon Scalable Processors.") AttributeError: 'NoneType' object has no attribute 'predict'


> Some additional system debug showing proper operation of the Flex 170:
$ sudo xpu-smi discovery +-----------+--------------------------------------------------------------------------------------+ Device ID Device Information +-----------+--------------------------------------------------------------------------------------+ 0 Device Name: Intel(R) Data Center GPU Flex 170 Vendor Name: Intel(R) Corporation SOC UUID: 00000000-0000-0000-d9d5-e18be95b77d2 PCI BDF Address: 0000:b3:00.0 DRM Device: /dev/dri/card1 Function Type: physical +-----------+--------------------------------------------------------------------------------------+ $ sudo xpu-smi stats -d 0 +-----------------------------+--------------------------------------------------------------------+ Device ID 0 +-----------------------------+--------------------------------------------------------------------+ GPU Utilization (%) 0 EU Array Active (%) N/A EU Array Stall (%) N/A EU Array Idle (%) N/A
Compute Engine Util (%) 0; Engine 0: 0, Engine 1: 0, Engine 2: 0, Engine 3: 0
Render Engine Util (%) 0; Engine 0: 0
Media Engine Util (%) 0
Decoder Engine Util (%) Engine 0: 0, Engine 1: 0
Encoder Engine Util (%) Engine 0: 0, Engine 1: 0
Copy Engine Util (%) 0; Engine 0: 0
Media EM Engine Util (%) Engine 0: 0, Engine 1: 0
3D Engine Util (%) N/A

+-----------------------------+--------------------------------------------------------------------+ | Reset | N/A | | Programming Errors | N/A | | Driver Errors | N/A | | Cache Errors Correctable | N/A | | Cache Errors Uncorrectable | N/A | | Mem Errors Correctable | N/A | | Mem Errors Uncorrectable | N/A | +-----------------------------+--------------------------------------------------------------------+ | GPU Power (W) | 42 | | GPU Frequency (MHz) | 2050 | | Media Engine Freq (MHz) | 1025 | | GPU Core Temperature (C) | 58 | | GPU Memory Temperature (C) | N/A | | GPU Memory Read (kB/s) | 1452 | | GPU Memory Write (kB/s) | 400 | | GPU Memory Bandwidth (%) | 0 | | GPU Memory Used (MiB) | 31 | | GPU Memory Util (%) | 0 | | Xe Link Throughput (kB/s) | N/A | +-----------------------------+--------------------------------------------------------------------+

$ sudo xpu-smi health -d 0 +----------------------------+---------------------------------------------------------------------+ | Device ID | 0 | +----------------------------+---------------------------------------------------------------------+ | 1. GPU Core Temperature | Status: OK | | | Description: All temperature sensors are healthy. | | | Throttle Threshold: 100 Celsius Degree | | | Shutdown Threshold: 125 Celsius Degree | +----------------------------+---------------------------------------------------------------------+ | 3. GPU Power | Status: OK | | | Description: All power domains are healthy. | | | Throttle Threshold: 150 watts | +----------------------------+---------------------------------------------------------------------+ | 6. GPU Frequency | Status: OK | | | Description: The device frequency not throttled | +----------------------------+---------------------------------------------------------------------+ $ sudo xpu-smi diag --precheck Journal file /var/log/journal/90338a962e854ed39e4e7ece1f53d71e/user-1666601109@000610bd6beba70f-62677c8b509c641c.journal~ is truncated, ignoring file. Journal file /var/log/journal/90338a962e854ed39e4e7ece1f53d71e/user-1666601109@000610bd6beba70f-62677c8b509c641c.journal~ is truncated, ignoring file. +------------------+-------------------------------------------------------------------------------+ | Component | Details | +------------------+-------------------------------------------------------------------------------+ | Driver | Status: Pass | +------------------+-------------------------------------------------------------------------------+ | CPU | CPU ID: 0 | | | Status: Pass | +------------------+-------------------------------------------------------------------------------+ | CPU | CPU ID: 1 | | | Status: Pass | +------------------+-------------------------------------------------------------------------------+ | GPU | BDF: 0000:b3:00.0 | | | Status: Pass | +------------------+-------------------------------------------------------------------------------+ $ sudo xpu-smi diag -d 0 -l 3 +-------------------------------+------------------------------------------------------------------+ | Device ID | 0 | +-------------------------------+------------------------------------------------------------------+ | Level | 3 | | Result | Pass | | Items | 12 | +-------------------------------+------------------------------------------------------------------+ | Software Env Variables | Result: Pass | | | Message: Pass to check environment variables. | +-------------------------------+------------------------------------------------------------------+ | Software Library | Result: Pass | | | Message: Pass to check libraries. | +-------------------------------+------------------------------------------------------------------+ | Software Permission | Result: Pass | | | Message: Pass to check permission. | +-------------------------------+------------------------------------------------------------------+ | Software Exclusive | Result: Pass | | | Message: Pass to check the software exclusive. | +-------------------------------+------------------------------------------------------------------+ | Computation Check | Result: Pass | | | Message: Pass to check computation. | +-------------------------------+------------------------------------------------------------------+ | Integration PCIe | Result: Pass | | | Message: Pass to check PCIe bandwidth. Its bandwidth is 17.908 | | | GBPS. | +-------------------------------+------------------------------------------------------------------+ | Media Codec | Result: Pass | | | Message: Pass to check Media transcode performance. | | | 1080p H.265 : 305 FPS | | | 1080p H.264 : 306 FPS | | | 4K H.265 : 85 FPS | | | 4K H.264 : 84 FPS | +-------------------------------+------------------------------------------------------------------+ | Performance Computation | Result: Pass | | | Message: Pass to check computation performance. Its | | | single-precision GFLOPS is 11120.119. | +-------------------------------+------------------------------------------------------------------+ | Performance Power | Result: Pass | | | Message: Pass to check stress power. Its stress power is 119 W. | +-------------------------------+------------------------------------------------------------------+ | Performance Memory Bandwidth | Result: Pass | | | Message: Pass to check memory bandwidth. Its memory bandwidth | | | is 361.042 GBPS. | +-------------------------------+------------------------------------------------------------------+ | Performance Memory Allocation | Result: Pass | | | Message: Pass to check memory allocation. | +-------------------------------+------------------------------------------------------------------+ | Memory Error | Result: Pass | | | Message: Pass to check memory error. | +-------------------------------+------------------------------------------------------------------+