facebookresearch / ToMe

A method to increase the speed and lower the memory footprint of existing vision transformers.
Other
931 stars 67 forks source link

Does ToMe work for focal modulation networks? #23

Open subneed opened 1 year ago

subneed commented 1 year ago

any help on modifying ToMe for focal modulation networks? I guess in FMN we could apply to me on Q/M. Also it has downsampling layers in each stage, so r value changes each stage and model definition?

dbolya commented 1 year ago

I'm not too familiar with FMNs, but it seems like it's a hierarchical network with a different attention mechanism? In principle you can use ToMe on anything that uses tokens, but like you said you'd need to be careful about the downsampling layers. You might be able to use ToMe instead of those downsampling layers, but that would probably require some exploration to figure out what's best.

chengyangfu commented 1 year ago

This problem is still up for debate in the research world, so we can only answer things that have already been covered in our paper.