Move DelayWrapper logic to Proxy

Giuseppe5 commented 2 months ago

Currently it's handled at the lowest level of integer quantization, but I argue it's not very intuitive. When a quantizer is called, it should always quantize. Then the proxy will decide whether to return the original float value (having still called the quantizer and accumulated statics/computed gradients) or the quantized value.

Pro:

More clear behaviour of the quantizer (will come in handy later when dealing with real quantization (opposite to fake quantization )
Support for DelayWrapper for all QuantDtype in one go
Proxy is still the ultimate controller over QuantTensor vs Tensor (and not the quantizer)

Cons:

Proxy might get messy in the future, keep track
Perform small, useless computation for a few iterations

aditya-167 commented 2 months ago

Hi I would like to work on this.. I am new to brevitas

Giuseppe5 commented 1 month ago

Hello, Thanks for offering! Although the general idea is relatively clear in my mind, there are some technical details about the implementation that I still need to figure out and discuss with @nickfraser. We'll keep this thread up-to-date when there are news.

Xilinx / brevitas

Move DelayWrapper logic to Proxy #1023