Adding position embedding to the intermediate hiera features

facebookresearch / hiera

Hiera: A fast, powerful, and simple hierarchical vision transformer.

Apache License 2.0

717 stars 36 forks source link

@dbolya

Thanks for the amazing work. I want to use hiera for feature extraction. I plan to add position embedding to the intermediate hiera features for further downstream tasks. What is the best to do this?

The easiest could be having a learnable position embedding or a sinusoidal position embedding replicating the same shape as that of the intermediate hiera features, but is there a better way to do this? For, e.g., using the positional embed from hiera itself? I see that hiera has spatial and temporal positional embedding available from the pretrained model, but the resultant shape is not compatible with the intermediate layer I am deriving the features from. Could you please suggest the best way to go forward?

It would be very helpful to provide a minimal code example on a dummy hiera feature dimension.

Thanks a lot in advance!

facebookresearch / hiera

Adding position embedding to the intermediate hiera features #19