Closed shahaamirbader closed 1 year ago
During training, the encoder is very expensive (as you have noticed). But during inference, it CAN be much cheaper if you follow the Figure 2 in the CVPR version to make the encoder operate in a streaming manner. This approach is similar to what is called "KV cache" in GPT. Using this approach, the computation process at the inference time is equivalent to that at the training time, but the efficiency can be largely improved. Currently this codebase does not include the implementation of KV cache.
I am really impressed by the work. However, reading through QCNet paper and QCNeXt report, you have mentioned that the encoders used are the same. Whereas the Fig 1 of QCNeXt technical report shows an encoder that is different from the QCNet encoder. Also, the encoder shown in the QCNeXt, is kind of what is shown in the QCNet video presentation for the existing works that are computationally expensive. Can you please clarify for better understanding? Also any expected time frame for release of QCNeXt code?