Efficient-Large-Model / VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
Apache License 2.0
878 stars 55 forks source link

Instruction for VILA 1.5 with tinychat (llm-awq) doesn't work well due to fixed torch version (==2.0.1) #36

Open gigony opened 2 months ago

gigony commented 2 months ago

Thank you for releasing the new version of VILA (1.5)!

I followed the installation instructions at https://github.com/mit-han-lab/llm-awq/tree/main?tab=readme-ov-file#install and ran the command python vlm_demo_new.py as detailed here: https://github.com/mit-han-lab/llm-awq/tree/main/tinychat#support-visual-language-models-vila-15-vila-llava

On Ubuntu 22.04 with CUDA 12.x, I installed the CUDA 12 libraries during step 2. However, in step 4, since VILA installs a specific version of torch (2.0.1) as specified here https://github.com/Efficient-Large-Model/VILA/blob/main/pyproject.toml#L16, it also installs CUDA 11 libraries, leading to library conflicts between packages in VILA and those in llm-awq.

The error encountered was:

File "/backup/repo/VILA/llm-awq/awq/quantize/qmodule.py", line 4, in <module>
   import awq_inference_engine  # with CUDA kernels
ImportError: /home/gbae/.pyenv/versions/vila/lib/python3.10/site-packages/awq_inference_engine-0.0.0-py3.10-linux-x86_64.egg/awq_inference_engine.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104impl3cow11cow_deleterEPv

To resolve this issue, I executed the following commands:

pip uninstall nvidia-cublas-cu11 nvidia-cuda-cupti-cu11 nvidia-cuda-nvrtc-cu11 nvidia-cuda-runtime-cu11 nvidia-cudnn-cu11 nvidia-cufft-cu11 nvidia-curand-cu11 nvidia-cusolver-cu11 nvidia-cusparse-cu11 nvidia-nccl-cu11 nvidia-nvtx-cu11

# Need to reinstall CUDA 12 libraries as directories are shared with CUDA 11 libraries and will be deleted.
pip uninstall nvidia-cublas-cu12 nvidia-cuda-cupti-cu12 nvidia-cuda-nvrtc-cu12 nvidia-cuda-runtime-cu12 nvidia-cudnn-cu12 nvidia-cufft-cu12 nvidia-curand-cu12 nvidia-cusolver-cu12 nvidia-cusparse-cu12 nvidia-nccl-cu12 nvidia-nvjitlink-cu12 nvidia-nvtx-cu12

pip install nvidia-cublas-cu12==12.1.3.1 nvidia-cuda-cupti-cu12==12.1.105 nvidia-cuda-nvrtc-cu12==12.1.105 nvidia-cuda-runtime-cu12==12.1.105 nvidia-cudnn-cu12==8.9.2.26 nvidia-cufft-cu12==11.0.2.54 nvidia-curand-cu12==10.3.2.106 nvidia-cusolver-cu12==11.4.5.107 nvidia-cusparse-cu12==12.1.0.106 nvidia-nccl-cu12==2.20.5 nvidia-nvjitlink-cu12==12.4.127 nvidia-nvtx-cu12==12.1.105

# Install flash_attn package for CUDA 12.x
wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.8/flash_attn-2.5.8+cu122torch2.3cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
pip install flash_attn-2.5.8+cu122torch2.3cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

Additionally, as mentioned in https://github.com/mit-han-lab/llm-awq/pull/180, the file model_worker_new.py is missing (@kentang-mit).

Please address this issue so that other users can follow the instructions and enjoy the Gradio app with VILA v1.5! Thanks!

kentang-mit commented 1 month ago

Hi @gigony,

Thanks for your interest! We recommend setting up a new environment to run AWQ+TinyChat since the VILA environment will override PyTorch in the AWQ+TinyChat environment and cause problems you mentioned. We apologize for the missing file and @ys-2020 is going to upload it to GitHub soon.

Best, Haotian

ys-2020 commented 1 month ago

Hi @gigony , we have uploaded python vlm_demo_new.py. Thank you for pointing this out and sorry for the inconvenience.

rahulthakur319 commented 1 month ago

Hi @gigony,

Thanks for your interest! We recommend setting up a new environment to run AWQ+TinyChat since the VILA environment will override PyTorch in the AWQ+TinyChat environment and cause problems you mentioned. We apologize for the missing file and @ys-2020 is going to upload it to GitHub soon.

Best, Haotian

Thanks for the great work. I ran into the same issue.

Can you please confirm what should be the ideal environment, Ubuntu 22.04 with CUDA 11.x libraries to support both AWQ+TinyChat and VILA?

ys-2020 commented 1 month ago

HI @rahulthakur319 , I think either CUDA 11.x or 12.x will work. The only thing you should be careful about is your current PyTorch version. For example, if you compile awq_inference_engine through python setup.py install with torch 2.3, then install VILA, which may automatically change the torch version, you may meet the error of undefined symbol in awq_inference_engine.

If that is the case, you may need to re-install awq_inference_engine with python setup.py install (remember to clean the pre-built files). Or you can set up a new environment as suggested by @kentang-mit .

hkunzhe commented 1 month ago

@rahulthakur319, you can install VILA firstly and the llm-awq, and ensure the version of PyTorch is fixed at 2.0.1. The following script can successfully run VILA1.5-3b/13b/40b-AWQ from Docker nvidia/cuda:11.8.0-devel-ubuntu22.04.

# Install VILA firstly
pip install --upgrade pip
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.4.2/flash_attn-2.4.2+cu118torch2.0cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
pip install flash_attn-2.4.2+cu118torch2.0cxx11abiFALSE-cp310-cp310-linux_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118

git clone https://github.com/Efficient-Large-Model/VILA.git
pip install setuptools_scm --index-url=https://pypi.org/simple
pip install -e .
pip install -e ".[train]"

pip install git+https://github.com/huggingface/transformers@v4.36.2
site_pkg_path=$(python3 -c 'import site; print(site.getsitepackages()[0])')
cp -rv ./llava/train/transformers_replace/* $site_pkg_path/transformers/

# Then install llm-awq
git clone https://github.com/mit-han-lab/llm-awq && cd llm-awq && pip install -e .
cd awq/kernels
python3 setup.py install

@ys-2020 BTW, does VILA support torch version higher than 2.0.1?