facebookresearch / hiera

Hiera: A fast, powerful, and simple hierarchical vision transformer.
Apache License 2.0
717 stars 36 forks source link

Adding position embedding to the intermediate hiera features #19

Open MLDeS opened 11 months ago

MLDeS commented 11 months ago

@dbolya

Thanks for the amazing work. I want to use hiera for feature extraction. I plan to add position embedding to the intermediate hiera features for further downstream tasks. What is the best to do this?

The easiest could be having a learnable position embedding or a sinusoidal position embedding replicating the same shape as that of the intermediate hiera features, but is there a better way to do this? For, e.g., using the positional embed from hiera itself? I see that hiera has spatial and temporal positional embedding available from the pretrained model, but the resultant shape is not compatible with the intermediate layer I am deriving the features from. Could you please suggest the best way to go forward?

It would be very helpful to provide a minimal code example on a dummy hiera feature dimension.

Thanks a lot in advance!

dbolya commented 11 months ago

This is technically an open research question, but the only time I've seen position embeddings added after the fact be useful is the relative position embeddings in ViTDet. Others usually you have to add during pretraining for them to be effective. Maybe you can take a look at detectron's implementation to get started? Specifically here and here. They add it to attention, but you can probably add them to the extracted feature maps pretty easily.