Open gigony opened 2 months ago
Hi @gigony,
Thanks for your interest! We recommend setting up a new environment to run AWQ+TinyChat since the VILA environment will override PyTorch in the AWQ+TinyChat environment and cause problems you mentioned. We apologize for the missing file and @ys-2020 is going to upload it to GitHub soon.
Best, Haotian
Hi @gigony , we have uploaded python vlm_demo_new.py
. Thank you for pointing this out and sorry for the inconvenience.
Hi @gigony,
Thanks for your interest! We recommend setting up a new environment to run AWQ+TinyChat since the VILA environment will override PyTorch in the AWQ+TinyChat environment and cause problems you mentioned. We apologize for the missing file and @ys-2020 is going to upload it to GitHub soon.
Best, Haotian
Thanks for the great work. I ran into the same issue.
Can you please confirm what should be the ideal environment, Ubuntu 22.04 with CUDA 11.x libraries to support both AWQ+TinyChat and VILA?
HI @rahulthakur319 , I think either CUDA 11.x or 12.x will work. The only thing you should be careful about is your current PyTorch version. For example, if you compile awq_inference_engine
through python setup.py install
with torch 2.3, then install VILA, which may automatically change the torch version, you may meet the error of undefined symbol in awq_inference_engine
.
If that is the case, you may need to re-install awq_inference_engine
with python setup.py install
(remember to clean the pre-built files). Or you can set up a new environment as suggested by @kentang-mit .
@rahulthakur319, you can install VILA firstly and the llm-awq, and ensure the version of PyTorch is fixed at 2.0.1. The following script can successfully run VILA1.5-3b/13b/40b-AWQ from Docker nvidia/cuda:11.8.0-devel-ubuntu22.04
.
# Install VILA firstly
pip install --upgrade pip
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.4.2/flash_attn-2.4.2+cu118torch2.0cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
pip install flash_attn-2.4.2+cu118torch2.0cxx11abiFALSE-cp310-cp310-linux_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
git clone https://github.com/Efficient-Large-Model/VILA.git
pip install setuptools_scm --index-url=https://pypi.org/simple
pip install -e .
pip install -e ".[train]"
pip install git+https://github.com/huggingface/transformers@v4.36.2
site_pkg_path=$(python3 -c 'import site; print(site.getsitepackages()[0])')
cp -rv ./llava/train/transformers_replace/* $site_pkg_path/transformers/
# Then install llm-awq
git clone https://github.com/mit-han-lab/llm-awq && cd llm-awq && pip install -e .
cd awq/kernels
python3 setup.py install
@ys-2020 BTW, does VILA support torch version higher than 2.0.1?
Thank you for releasing the new version of VILA (1.5)!
I followed the installation instructions at https://github.com/mit-han-lab/llm-awq/tree/main?tab=readme-ov-file#install and ran the command
python vlm_demo_new.py
as detailed here: https://github.com/mit-han-lab/llm-awq/tree/main/tinychat#support-visual-language-models-vila-15-vila-llavaOn Ubuntu 22.04 with CUDA 12.x, I installed the CUDA 12 libraries during step 2. However, in step 4, since VILA installs a specific version of torch (2.0.1) as specified here https://github.com/Efficient-Large-Model/VILA/blob/main/pyproject.toml#L16, it also installs CUDA 11 libraries, leading to library conflicts between packages in VILA and those in llm-awq.
The error encountered was:
To resolve this issue, I executed the following commands:
Additionally, as mentioned in https://github.com/mit-han-lab/llm-awq/pull/180, the file
model_worker_new.py
is missing (@kentang-mit).Please address this issue so that other users can follow the instructions and enjoy the Gradio app with VILA v1.5! Thanks!