Instruction for VILA 1.5 with tinychat (llm-awq) doesn't work well due to fixed torch version (==2.0.1)

gigony commented 2 months ago

Thank you for releasing the new version of VILA (1.5)!

I followed the installation instructions at https://github.com/mit-han-lab/llm-awq/tree/main?tab=readme-ov-file#install and ran the command python vlm_demo_new.py as detailed here: https://github.com/mit-han-lab/llm-awq/tree/main/tinychat#support-visual-language-models-vila-15-vila-llava

On Ubuntu 22.04 with CUDA 12.x, I installed the CUDA 12 libraries during step 2. However, in step 4, since VILA installs a specific version of torch (2.0.1) as specified here https://github.com/Efficient-Large-Model/VILA/blob/main/pyproject.toml#L16, it also installs CUDA 11 libraries, leading to library conflicts between packages in VILA and those in llm-awq.

The error encountered was:

File "/backup/repo/VILA/llm-awq/awq/quantize/qmodule.py", line 4, in <module>
   import awq_inference_engine  # with CUDA kernels
ImportError: /home/gbae/.pyenv/versions/vila/lib/python3.10/site-packages/awq_inference_engine-0.0.0-py3.10-linux-x86_64.egg/awq_inference_engine.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104impl3cow11cow_deleterEPv

To resolve this issue, I executed the following commands:

pip uninstall nvidia-cublas-cu11 nvidia-cuda-cupti-cu11 nvidia-cuda-nvrtc-cu11 nvidia-cuda-runtime-cu11 nvidia-cudnn-cu11 nvidia-cufft-cu11 nvidia-curand-cu11 nvidia-cusolver-cu11 nvidia-cusparse-cu11 nvidia-nccl-cu11 nvidia-nvtx-cu11

# Need to reinstall CUDA 12 libraries as directories are shared with CUDA 11 libraries and will be deleted.
pip uninstall nvidia-cublas-cu12 nvidia-cuda-cupti-cu12 nvidia-cuda-nvrtc-cu12 nvidia-cuda-runtime-cu12 nvidia-cudnn-cu12 nvidia-cufft-cu12 nvidia-curand-cu12 nvidia-cusolver-cu12 nvidia-cusparse-cu12 nvidia-nccl-cu12 nvidia-nvjitlink-cu12 nvidia-nvtx-cu12

pip install nvidia-cublas-cu12==12.1.3.1 nvidia-cuda-cupti-cu12==12.1.105 nvidia-cuda-nvrtc-cu12==12.1.105 nvidia-cuda-runtime-cu12==12.1.105 nvidia-cudnn-cu12==8.9.2.26 nvidia-cufft-cu12==11.0.2.54 nvidia-curand-cu12==10.3.2.106 nvidia-cusolver-cu12==11.4.5.107 nvidia-cusparse-cu12==12.1.0.106 nvidia-nccl-cu12==2.20.5 nvidia-nvjitlink-cu12==12.4.127 nvidia-nvtx-cu12==12.1.105

# Install flash_attn package for CUDA 12.x
wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.8/flash_attn-2.5.8+cu122torch2.3cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
pip install flash_attn-2.5.8+cu122torch2.3cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

Additionally, as mentioned in https://github.com/mit-han-lab/llm-awq/pull/180, the file model_worker_new.py is missing (@kentang-mit).

Please address this issue so that other users can follow the instructions and enjoy the Gradio app with VILA v1.5! Thanks!

kentang-mit commented 1 month ago

Hi @gigony,

Thanks for your interest! We recommend setting up a new environment to run AWQ+TinyChat since the VILA environment will override PyTorch in the AWQ+TinyChat environment and cause problems you mentioned. We apologize for the missing file and @ys-2020 is going to upload it to GitHub soon.

Best, Haotian

ys-2020 commented 1 month ago

Hi @gigony , we have uploaded python vlm_demo_new.py. Thank you for pointing this out and sorry for the inconvenience.

rahulthakur319 commented 1 month ago

Hi @gigony,

Thanks for your interest! We recommend setting up a new environment to run AWQ+TinyChat since the VILA environment will override PyTorch in the AWQ+TinyChat environment and cause problems you mentioned. We apologize for the missing file and @ys-2020 is going to upload it to GitHub soon.

Best, Haotian

Thanks for the great work. I ran into the same issue.

Can you please confirm what should be the ideal environment, Ubuntu 22.04 with CUDA 11.x libraries to support both AWQ+TinyChat and VILA?

ys-2020 commented 1 month ago

HI @rahulthakur319 , I think either CUDA 11.x or 12.x will work. The only thing you should be careful about is your current PyTorch version. For example, if you compile awq_inference_engine through python setup.py install with torch 2.3, then install VILA, which may automatically change the torch version, you may meet the error of undefined symbol in awq_inference_engine.

If that is the case, you may need to re-install awq_inference_engine with python setup.py install (remember to clean the pre-built files). Or you can set up a new environment as suggested by @kentang-mit .

hkunzhe commented 1 month ago

@rahulthakur319, you can install VILA firstly and the llm-awq, and ensure the version of PyTorch is fixed at 2.0.1. The following script can successfully run VILA1.5-3b/13b/40b-AWQ from Docker nvidia/cuda:11.8.0-devel-ubuntu22.04.

# Install VILA firstly
pip install --upgrade pip
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.4.2/flash_attn-2.4.2+cu118torch2.0cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
pip install flash_attn-2.4.2+cu118torch2.0cxx11abiFALSE-cp310-cp310-linux_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118

git clone https://github.com/Efficient-Large-Model/VILA.git
pip install setuptools_scm --index-url=https://pypi.org/simple
pip install -e .
pip install -e ".[train]"

pip install git+https://github.com/huggingface/transformers@v4.36.2
site_pkg_path=$(python3 -c 'import site; print(site.getsitepackages()[0])')
cp -rv ./llava/train/transformers_replace/* $site_pkg_path/transformers/

# Then install llm-awq
git clone https://github.com/mit-han-lab/llm-awq && cd llm-awq && pip install -e .
cd awq/kernels
python3 setup.py install

@ys-2020 BTW, does VILA support torch version higher than 2.0.1?

Efficient-Large-Model / VILA

Instruction for VILA 1.5 with tinychat (llm-awq) doesn't work well due to fixed torch version (==2.0.1) #36