Open zen-d opened 1 year ago
@Fangyi-Chen Thanks for your great work. I would like to ask about how much inference latency increases compared to the basic pathway, since queries are much heavier in later decoding stages.
Hi,
As a training strategy, the SQR is only applied in the training phase. the inference pipeline is not changed. So, no inference latency overhead.
@Fangyi-Chen Thanks for your great work. I would like to ask about how much inference latency increases compared to the basic pathway, since queries are much heavier in later decoding stages.