ShirAmir / dino-vit-features

Official implementation for the paper "Deep ViT Features as Dense Visual Descriptors".
https://dino-vit-features.github.io
MIT License
383 stars 44 forks source link

permutation order #14

Open philipsgithub opened 1 year ago

philipsgithub commented 1 year ago

https://github.com/ShirAmir/dino-vit-features/blob/1779649078744e7d9be903325355fd481cca8103/extractor.py#L299

I am a bit confused. According to my understanding it should be x.permute(0, 2, 1, 3). same goes for line 239.

do I miss something?

PS: https://github.com/ShirAmir/dino-vit-features/blob/1779649078744e7d9be903325355fd481cca8103/extractor.py#L239

ShirAmir commented 1 year ago

As mentioned in the documentation of _extract_features, the outputted features are of shape Bxhxtxd. Hence, in the line you mentioned x.permute(0, 2, 3, 1) reshapes to Bxtxdxh; .flatten(start_dim=-2, end_dim=-1) reshapes to Bxtx(dxh) and .unsqueeze(dim=1) reshapes to Bx1xtx(dxh) as needed.