dusty-nv / jetson-containers

Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
MIT License
2.13k stars 440 forks source link

MLC-LLM quantization failure due to param_manager #316

Closed amevec closed 10 months ago

amevec commented 10 months ago

Hardware: Orin AGX/NX Software: dustynv/mlc:r35.4.1 Issue Summary: When using the mlc image for compiling LLMs, the model compilation is failing with vicuna-7b-v1.5.

Resolution: Use updated mlc-llm main branch or apply commit manually to /usr/local/lib/python3.8/dist-packages/mlc_llm/relax_model/param_manager.py MLC-LLM Pull Request: https://github.com/mlc-ai/mlc-llm/pull/917 MLC-LLM Commit for specific fix (to hotfix existing container): https://github.com/mlc-ai/mlc-llm/pull/917/commits/4a0e7a912a7085eb7cd166d5d8b584e1b5ed3947

Reproduce:

apt-get update && apt-get install git-lfs
git lfs install
mkdir -p /app/storage/models
cd /app/storage/models
git clone https://huggingface.co/lmsys/vicuna-7b-v1.5
python3 -m mlc_llm.build --model vicuna-7b-v1.5 \
    --quantization q4f16_ft \
    --artifact-path /app/storage/ \
    --max-seq-len 4096 \
    --target cuda \
    --use-cuda-graph

The compute/quantize step fails with the following error:

Finish exporting chat config to /app/storage/vicuna-7b-v1.5-q4f16_ft/params/mlc-chat-config.json
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.8/dist-packages/mlc_llm/build.py", line 13, in <module>
    main()
  File "/usr/local/lib/python3.8/dist-packages/mlc_llm/build.py", line 10, in main
    core.build_model_from_args(parsed_args)
  File "/usr/local/lib/python3.8/dist-packages/mlc_llm/core.py", line 638, in build_model_from_args
    mod = mod_transform_before_build(mod, param_manager, args, model_config)
  File "/usr/local/lib/python3.8/dist-packages/mlc_llm/core.py", line 370, in mod_transform_before_build
    mod = param_manager.transform_dequantize(mod)
  File "/usr/local/lib/python3.8/dist-packages/mlc_llm/relax_model/param_manager.py", line 366, in transform_dequantize
    "params", self.get_quantized_param_info(gv.name_hint)
  File "/usr/local/lib/python3.8/dist-packages/mlc_llm/relax_model/param_manager.py", line 429, in get_quantized_param_info
    quantized_data = bb.normalize(f_quantize(bb, provided_tensor_vars))
  File "/usr/local/lib/python3.8/dist-packages/mlc_llm/quantization/ft_rowwise_quantization.py", line 37, in f_quantize
    encoded_data = bb.emit_te(
  File "/usr/local/lib/python3.8/dist-packages/tvm/relax/block_builder.py", line 524, in emit_te
    return self.emit(self.call_te(func, *args, **kwargs), name_hint=name_hint)
  File "/usr/local/lib/python3.8/dist-packages/tvm/relax/block_builder.py", line 307, in emit
    return _ffi_api.BlockBuilderEmit(self, expr, name_hint)  # type: ignore
  File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 263, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./packed_func.pxi", line 252, in tvm._ffi._cy3.core.FuncCall3
  File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL
  File "/usr/local/lib/python3.8/dist-packages/tvm/_ffi/base.py", line 476, in raise_last_ffi_error
    raise py_err
tvm.error.InternalError: Traceback (most recent call last):
  [bt] (5) /usr/local/lib/python3.8/dist-packages/tvm/libtvm.so(TVMFuncCall+0x64) [0xffff66650a3c]
  [bt] (4) /usr/local/lib/python3.8/dist-packages/tvm/libtvm.so(+0x1723314) [0xffff64a85314]
  [bt] (3) /usr/local/lib/python3.8/dist-packages/tvm/libtvm.so(tvm::relax::BlockBuilderImpl::Emit(tvm::RelayExpr, tvm::runtime::String)+0xc8) [0xffff64a906b0]
  [bt] (2) /usr/local/lib/python3.8/dist-packages/tvm/libtvm.so(+0x1720824) [0xffff64a82824]
  [bt] (1) /usr/local/lib/python3.8/dist-packages/tvm/libtvm.so(tvm::runtime::detail::LogFatal::Entry::Finalize()+0x78) [0xffff643733a0]
  [bt] (0) /usr/local/lib/python3.8/dist-packages/tvm/libtvm.so(tvm::runtime::Backtrace[abi:cxx11]()+0x30) [0xffff6669dd50]
  File "/opt/mlc-llm/3rdparty/tvm/src/relax/ir/block_builder.cc", line 321
InternalError: Check failed: (!block_stack_.empty()) is false: no block is being built
dusty-nv commented 10 months ago

Hi @amevec, I have updated the dustynv/mlc:dev container to track the main branch of mlc_llm repo (see commit https://github.com/dusty-nv/jetson-containers/commit/d89fee0dbc21b9496d6067de776c0fc1c3224147)

So try using dustynv/mlc:dev instead if you want the latest updates in MLC. Whereas dustynv/mlc is reserved for a stable/tested version (actually the MLC project is unversioned, which is why I'm doing it here by commit SHA). I am seeing a 10-15% perf regression between 10/20/2023 (mlc_llm sha 9bf5723) and now.