Extending Hugging Face transformers APIs for Transformer-based models and improve the productivity of inference deployment. With extremely compressed models, the toolkit can greatly improve the inference efficiency on Intel platforms.
Describe the issue
When I use pip install intel-extension-for-transformers in a fresh conda environment, there are some packages missing that I have to manually install before I can run a model.
Please add these to setup.py as deps!
accelerate
neural-speed
gguf
there may be others, I installed some other things before I started writing these down
To reproduce, simply:
conda create -n itrex python=3.9
pip install intel-extension-for-transformers
Run the following 3-line python script:
from intel_extension_for_transformers.transformers import AutoModelForCausalLM
model_name = "meta-llama/Llama-2-7b-hf"
model = AutoModelForCausalLM.from_pretrained(model_name, load_in_4bit=True)
Describe the issue When I use
pip install intel-extension-for-transformers
in a fresh conda environment, there are some packages missing that I have to manually install before I can run a model.Please add these to setup.py as deps!
To reproduce, simply:
conda create -n itrex python=3.9
pip install intel-extension-for-transformers