Great question. The underlying principles in our work do not leverage any knowledge of the domain, and thus should be applicable to vision and audio transformers as well. That being said, we haven't got the chance to test whether it actually works. Please feel free to try it out and let us know your experience!
Great question. The underlying principles in our work do not leverage any knowledge of the domain, and thus should be applicable to vision and audio transformers as well. That being said, we haven't got the chance to test whether it actually works. Please feel free to try it out and let us know your experience!