An experimental Perceiver architecture variant where each layer can have increasing latent_dim and fewer latents. Similar in concept to the Multiscale Vision Transformers (Haoqi Fan, Bo Xiong, Karttikeya Mangalam, Yanghao Li, Zhicheng Yan, Jitendra Malik, Christoph Feichtenhofer — https://arxiv.org/abs/2104.11227), but adapted to Perceivers. No idea if this can work.
An experimental Perceiver architecture variant where each layer can have increasing latent_dim and fewer latents. Similar in concept to the Multiscale Vision Transformers (Haoqi Fan, Bo Xiong, Karttikeya Mangalam, Yanghao Li, Zhicheng Yan, Jitendra Malik, Christoph Feichtenhofer — https://arxiv.org/abs/2104.11227), but adapted to Perceivers. No idea if this can work.