VincyZhang / intel-extension-for-transformers

Extending Hugging Face transformers APIs for Transformer-based models and improve the productivity of inference deployment. With extremely compressed models, the toolkit can greatly improve the inference efficiency on Intel platforms.
Apache License 2.0
0 stars 0 forks source link

failed to create the serving #17

Open VincyZhang opened 4 months ago

VincyZhang commented 4 months ago

I tried to create the serving on my system, but failed with the below error: (emon_analyzer) [root@SPR-1 emon_data_analyzer]# neuralchat_server start --config_file ./config/neuralchat.yaml 2024-03-19 11:38:57,005 - numexpr.utils - INFO - Note: detected 224 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable. 2024-03-19 11:38:57,005 - numexpr.utils - INFO - Note: NumExpr detected 224 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8. 2024-03-19 11:38:57,005 - numexpr.utils - INFO - NumExpr defaulting to 8 threads. 2024-03-19 11:38:57,348 - datasets - INFO - PyTorch version 2.2.0+cpu available. [2024-03-19 11:38:57,430] [ ERROR] - Failed to start server. [2024-03-19 11:38:57,430] [ ERROR] - partially initialized module 'intel_extension_for_pytorch' has no attribute '_C' (most likely due to a circular import)

yaml config file:

host: 0.0.0.0 port: 8000

model_name_or_path: "Intel/neural-chat-7b-v3-1"

model_name_or_path: "/home/zluo2/TableLlama-model"

tokenizer_name_or_path: ""

peft_model_path: "./models/emon_llama"

device: "cpu"

asr: enable: false args:

support cpu, hpu, xpu, cuda

    device: "cpu"
    # support openai/whisper series
    model_name_or_path: "openai/whisper-small"
    # only can be set to true when the device is set to "cpu"
    bf16: false

tts: enable: false args: device: "cpu" voice: "default" stream_mode: false output_audio_path: "./output_audio.wav"

asr_chinese: enable: false

tts_chinese: enable: false args: device: "cpu" spk_id: 0 stream_mode: false output_audio_path: "./output_audio.wav"

retrieval: enable: true args: input_path: "./rag_data/emon-sample"

vector_database: "Qdrant"

    #retrieval_type: "bm25"

safety_checker: enable: false

ner: enable: false args: spacy_model: "en_core_web_lg"

tasks_list: ['textchat', 'retrieval']

VincyZhang commented 4 months ago

Please check your IPEX version. The trace shows that the IPEX version is mismatch with Pytorch.

PyTorch version 2.2.0

VincyZhang commented 4 months ago

the issue is solved by install oneccl.

VincyZhang commented 4 months ago

Hi, thanks for bringing this up and for the suggestion! We'll look into adding the packages you listed as well as whatever other ones are missing

VincyZhang commented 4 months ago

@NeuralChatBot

NeuralChatBot commented 4 months ago

It seems like you're facing an issue with setting up a neural chat server. To resolve this, you need to ensure that you have the required packages installed and properly configured. Here are some steps to follow:

  1. Check your IPEX version: Make sure you have the correct IPEX version compatible with your PyTorch version.

  2. Install the missing packages: Install the packages mentioned in the traceback, such as oneccl, and any other missing packages.

  3. Configure the server: Make sure you have the correct configuration in your yaml file. You can refer to the provided yaml config file for guidance.

  4. Start the server: Run the command "neuralchat_server start --config_file ./config/neuralchat.yaml" to start the server.

  5. Troubleshooting: If the issue persists, you may need to check your system's environment variables, such as "NUMEXPR_MAX_THREADS" and "CUDA_VISIBLE_DEVICES," and ensure they are set correctly.

  6. Seek help: If you're still unable to resolve the issue, you can reach out to the developers of the neural chat server for further assistance.