Online quantizer h() explain

As expressed in the paper, after passing the patch embeddings through ViT and a decoder D, we get some feature vectors (each patch is a vector). What confuses me is the online quantizer h() [78] mentioned in the paper. As far as I understand in the Dino paper, these feature vectors are softmax-ed to create some distribution; hence, I imagine this h() is also work like that. However, I don't understand what is quantized here and how exactly it is transformed into distribution. Can anybody help me explain this?

LightDXY / MaskCLIP

Online quantizer h() explain #7