Thanks for your great efforts first. I read the PR you opened in the TensorRT-LLM repo and noticed that EP +TP, PP + TP, and TP are supported during inference. May I ask which one is optimal? Specifically, as for the MoE layer, does EP or TP yield better performance?
Thanks for your great efforts first. I read the PR you opened in the TensorRT-LLM repo and noticed that EP +TP, PP + TP, and TP are supported during inference. May I ask which one is optimal? Specifically, as for the MoE layer, does EP or TP yield better performance?