Guide for MLC benchmarks is not working out of the box

ramyadhadidi commented 1 month ago

Hello, seems like the guide found here https://github.com/dusty-nv/jetson-containers/blob/master/packages/llm/mlc/README.md is outdated. Specifically, mlc_llm.build cannot be found. Maybe I am missing something, but I used the latest mlc docker container.

./run.sh $(./autotag mlc) \
  python3 -m mlc_llm.build \
    --model Llama-2-7b-chat-hf \
    --quantization q4f16_ft \
    --artifact-path /data/models/mlc/dist \
    --max-seq-len 4096 \
    --target cuda \
    --use-cuda-graph \
    --use-flash-attn-mqa

You can update it with the following commands:

python3 -m mlc_llm convert_weight /data/models/mlc/dist/Llama-2-7b-chat-hf --quantization q4f16_ft --output /data/models/mlc/dist/Llama-2-7b-chat-hf-q4f16_ft

python3 -m mlc_llm gen_config /data/models/mlc/dist/Llama-2-7b-chat-hf --quantization q4f16_ft --output /data/models/mlc/dist/Llama-2-7b-chat-hf-q4f16_ft --conv-template llama-2

ramyadhadidi commented 1 month ago

After some debugging, I found that the target container should be mlc-builder and not mlc. The document doesn't mention the difference or the correct command.

dusty-nv commented 1 month ago

Ah thanks @ramyadhadidi , yes I have various versions of MLC floating around, and the latest was after their transition from mlc_llm.build to mlc_llm convert_weight way. It seems like I pushed the builder but not the deployment container - which version of JetPack-L4T are you on?

ramyadhadidi commented 1 month ago

I'm on the latest one L4T_VERSION=36.3.0 JETPACK_VERSION=6.0 CUDA_VERSION=12.2

dusty-nv / jetson-containers

Guide for MLC benchmarks is not working out of the box #529