Open leizhao1234 opened 10 months ago
Could you share how do you print the activation values?
I am afraid that you try printing a half number by a float data type in printf
. Please cast them to float
before printing.
Hi @leizhao1234 do u still have further issue or question now? If not, we'll close it soon.
When I was running the benchmark for Llama 70b, I found that all of the activation values are zero. ''' python build.py --model_dir /code/tensorrt_llm/models/Llama-2-70b-chat-hf/ --dtype float16 --remove_input_padding --use_gpt_attention_plugin float16 --enable_context_fmha --use_gemm_plugin float16 --use_weight_only --weight_only_precision int4 --paged_kv_cache --use_inflight_batching --int8_kv_cache --output_dir ./tmp/llama/70B/trt_engines/fp16/1-gpu/
./gptSessionBenchmark --model llama --engine_dir /code/tensorrt_llm/models/tmp/llama/70B/trt_engines/fp16/1-gpu/ --batch_size "8" --input_output_len "1024,1" ''' I don't know what happens, and i think multiplication of matrices with all zeros can greatly affect performance.