LightDXY / MaskCLIP

34 stars 0 forks source link

Online quantizer h() explain #7

Open ndhuynh02 opened 3 weeks ago

ndhuynh02 commented 3 weeks ago

As expressed in the paper, after passing the patch embeddings through ViT and a decoder D, we get some feature vectors (each patch is a vector). What confuses me is the online quantizer h() [78] mentioned in the paper. As far as I understand in the Dino paper, these feature vectors are softmax-ed to create some distribution; hence, I imagine this h() is also work like that. However, I don't understand what is quantized here and how exactly it is transformed into distribution. Can anybody help me explain this?

MohammedSB commented 1 week ago

I asked this question a while back on Reddit: https://www.reddit.com/r/MachineLearning/comments/1cpqe3h/r_trying_to_understand_a_certain_function_in/

We ended up trying our best to reproduce it by testing out different configurations. Nothing really worked well. https://arxiv.org/abs/2405.14239