facebookresearch / ToMe

A method to increase the speed and lower the memory footprint of existing vision transformers.
Other
957 stars 68 forks source link

Can ToMe be applied together with RoPE? #29

Closed cjdjr closed 4 months ago

cjdjr commented 1 year ago

Can ToMe be applied together with RoPE? How to calculate position embeddings after merging token?

dbolya commented 1 year ago

I've never tried it but you could try averaging the computed position embeddings using the "source" matrix. Basically, average the position embeddings of all individual tokens that were merged into one and use that as the new position embedding.