Open algorithmconquer opened 2 weeks ago
We thought DiT requeires lower latency rather than higher throughput. TP/CP can possibly reduce the model inference latency, however PP is not help with latency. That's the reason why we didn't support PP for DiT now.
In convert_checkpoint.py, the code is 'assert args.pp_size == 1, "PP is not supported yet."', why dit does not support pp_size > 1?