Open rubencart opened 5 months ago
First of all, throughput in this case represents "how many audio samples can the model generate per second". So, naively thinking, if we use 2 times larger batch size, the model can parallelly generate samples of twice the number, which is what happens in the AR case in the Figure 2.
The difference of inference speed (throughput) between AR and non-AR arises from the following reasons.
The paper states the following: "While the nonautoregressive model throughput is bounded to ∼ 2.8 samples/second for batch sizes bigger than 64, the autoregressive model throughput is linear in batch size, only limited by the GPU memory".
Could you explain why? Why would the throughput for the AR model be linear w.r.t. batch size, but the throughput for the non-AR model more or less constant?
I would expect more or less such a relation between throughput and sequence length, but I don't immediately see what causes this connection between throughput and batch size.