Open SoundProvider opened 1 day ago
@SoundProvider, could you also show the command to convert the checkpoint?
DEVICES=0,1,2,3
TP_SIZE=4
BATCH_SIZE=4
CUDA_VISIBLE_DEVICES=${DEVICES} \
python /app/tensorrt_llm/examples/medusa/convert_checkpoint.py \
--model_dir /app/models/vicuna-33b-v1.3 \
--medusa_model_dir /app/models/medusa-vicuna-33b-v1.3 \
--output_dir /app/models/medusa_test/tensorrt/${TP_SIZE}-gpu \
--dtype float16 \
--num_medusa_heads 4 \
--tp_size ${TP_SIZE}
CUDA_VISIBLE_DEVICES=${DEVICES} \
trtllm-build --checkpoint_dir /app/models/medusa_test/tensorrt/${TP_SIZE}-gpu \
--gpt_attention_plugin float16 \
--gemm_plugin float16 \
--context_fmha enable \
--output_dir /app/models/medusa_test/tensorrt_llm/${TP_SIZE}-gpu \
--speculative_decoding_mode medusa \
--max_batch_size ${BATCH_SIZE} \
--workers ${TP_SIZE}
@hello-11 I use the medusa example here.
Thank you for developing trt-llm. It's helping me a lot I'm trying to use medusa with trt-llm, referencing this page
It's working fine with vicuna 7B and its medusa heads, with no errors at all.
However, when implementing with vicuna 33B and its trained heads, the following error occurs when executing
trtllm-build
converting checkpoint with medusa was done with following result