All of the activation values are zero in benchmark

leizhao1234 commented 10 months ago

When I was running the benchmark for Llama 70b, I found that all of the activation values are zero. ''' python build.py --model_dir /code/tensorrt_llm/models/Llama-2-70b-chat-hf/ --dtype float16 --remove_input_padding --use_gpt_attention_plugin float16 --enable_context_fmha --use_gemm_plugin float16 --use_weight_only --weight_only_precision int4 --paged_kv_cache --use_inflight_batching --int8_kv_cache --output_dir ./tmp/llama/70B/trt_engines/fp16/1-gpu/

./gptSessionBenchmark --model llama --engine_dir /code/tensorrt_llm/models/tmp/llama/70B/trt_engines/fp16/1-gpu/ --batch_size "8" --input_output_len "1024,1" ''' I don't know what happens, and i think multiplication of matrices with all zeros can greatly affect performance.

byshiue commented 10 months ago

Could you share how do you print the activation values?

leizhao1234 commented 10 months ago

byshiue commented 10 months ago

I am afraid that you try printing a half number by a float data type in printf. Please cast them to float before printing.

nv-guomingz commented 1 day ago

Hi @leizhao1234 do u still have further issue or question now? If not, we'll close it soon.

NVIDIA / TensorRT-LLM

All of the activation values are zero in benchmark #844