Open tonylek opened 1 week ago
--cluster_key
is used with --auto_parallel N
, it can provide cluster info to determine auto parallel's sharding strategy. In other word, --cluster_key
does not help to cross-build engine to different type of GPUs, which is unsupported in TRT-LLM. Since you does not use --auto_parallel N
, --cluster_key
should make no effect to the build process. I will update the help message to avoid confusion when users does not use --auto_parallel N
.
any chance it will be supported in the future?
The cross-build feature is not planned. Please build and deploy on the same type of GPU.
@tonylek could we close this ticket now?
Hi, I tried the
--cluster-key
option with trtllm-build. I did the conversion with A100-80gb-sxm, then tried to deploy it on L4 after converting using the L4 option and it failed when starting up the tritonserver.When I'm trying to deploy it on a100-40gb-sxm with its relevant option it does start-up work but I'm getting:
[TensorRT-LLM][WARNING] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
In my config.json there appears to be the cluster_key I converted with.