Fangyi-Chen / SQR

MIT License
103 stars 5 forks source link

Inference latency #1

Open zen-d opened 1 year ago

zen-d commented 1 year ago

@Fangyi-Chen Thanks for your great work. I would like to ask about how much inference latency increases compared to the basic pathway, since queries are much heavier in later decoding stages.

Fangyi-Chen commented 1 year ago

Hi,

As a training strategy, the SQR is only applied in the training phase. the inference pipeline is not changed. So, no inference latency overhead.