Currently caching of input/output values for elementwise residual modules is broken. Reimplementing mixins are metaclasses should help, but a new mechanism for caching list of input quant tensors is probably gonna be necessary as with cat we don't know until runtime how many tensors are going to be there.
It's easier to just pass the whole list of input/output tensors to cache together at once and cache them instead of looping over them. No need for metaclasses.
Currently caching of input/output values for elementwise residual modules is broken. Reimplementing mixins are metaclasses should help, but a new mechanism for caching list of input quant tensors is probably gonna be necessary as with cat we don't know until runtime how many tensors are going to be there.