dusty-nv / jetson-containers

Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
MIT License
1.93k stars 422 forks source link

AttributeError(f"Module has no function '{name}'") #412

Open whk6688 opened 4 months ago

whk6688 commented 4 months ago

i ran following command: python3 /opt/mlc-llm/benchmark.py --model /data/models/mlc/dist/Llama-2-7b-chat-hf-q4f16_ft/params --prompt /data/prompts/completion_16.json --max-new-tokens 128

image

the result was: /data/models/mlc/dist/Llama-2-7b-chat-hf-q4f16_ft/params: input=16 output=128 prefill_time 0.081 sec, prefill_rate 197.0 tokens/sec, decode_time 6.632 sec, decode_rate 19.3 tokens/sec

but your result was: /data/models/mlc/dist/Llama-2-7b-chat-hf-q4f16_ft/params: prefill_time 0.027 sec, prefill_rate 582.8 tokens/sec, decode_time 2.986 sec, decode_rate 42.9 tokens/sec


19.3 vs 42.9

Maybe the root cause is the error of screenshot.

thanks!

dusty-nv commented 4 months ago

Hi @whk6688, you can ignore that metadata error from the screenshot, it's unrelated and you can safely ignore that warning.

What Jetson are you running? Is it in maximum power mode (nvpmodel) ?

Also I getting closer to 47 tokens/sec now on llama-2-7b with MLC (on AGX Orin and JetPack 6.0)

whk6688 commented 4 months ago

Hi @dusty-nv my device is: AGX ORIN 32G JetPack 6.0(6.0-b52) maximum power mode? Could you tell me how to set the mode? Thanks!

dusty-nv commented 4 months ago

it should be sudo nvpmodel -m 0, but also AGX Orin 32GB has fewer cores than AGX Orin 64GB

whk6688 commented 4 months ago

I have tried it。 You are right! Perfect! /data/models/mlc/dist/Llama-2-7b-chat-hf-q4f16_ft/params: prefill_time 0.037 sec, prefill_rate 438.0 tokens/sec, decode_time 3.028 sec, decode_rate 42.3 tokens/sec thanks again。 maestro