Why multi-scale features partially shared a convolution network via PhiPartiallyShared

VAR is indeed impressive, but there’s one issue that’s been bothering me. We reached out to the authors for assistance with the matter, and we appreciate your help.

In the quant.py line 33: self.quant_resi = PhiPartiallyShared(nn.ModuleList([(Phi(Cvae, quant_resi) if abs(quantresi) > 1e-6 else nn.Identity()) for in range(share_quant_resi)])). phi

According to my understanding, self.quant_resi is the $\phi_k(\cdot)$ function. There are 4 different $\phi_k(\cdot)$, and some scales share the same $\phi_k(\cdot)$, for example: $\phi_1(\cdot) = \phi_2(\cdot)$, $\phi_3(\cdot) = \phi_4(\cdot) = \phi_5(\cdot) $, $\phi_6(\cdot) = \phi_7(\cdot)$, $\phi_8(\cdot) = \phi9(\cdot) = \phi{10}(\cdot) $. I have two questions: 1) why we need to introduce $\phi_k(\cdot)$, I feel this is somewhat counterintuitive. In contrast, in RQ-VAE, it adopts: $f = f-z_k$ instead of $f = f-\phi_k(z_k)$. I want to know the true role of $\phi_k(\cdot)$; 2) why different scale share the same $\phi_k(\cdot)$, e.g., $\phi_1(\cdot) = \phi_2(\cdot)$, $\phi_3(\cdot) = \phi_4(\cdot) = \phi_5(\cdot) $?

FoundationVision / VAR

Why multi-scale features partially shared a convolution network via PhiPartiallyShared #73