Different Attention types and bottom-up configuration

Hello! I'm working on a master thesis about bottom-up pose estimation on high resolution images. Your paper seems to address both of these topics successfully, yet I am unable to find a configuration for the bottom-up approach presented in the paper nor the 2 modifications to standard Attention (Shift Window and Pooling Window) to tackle higher resolution feature maps. Am I overlooking something in the repository? Or are these parts of the paper not part of this implementation and if so, are there any plans to release them later? Or some other place to find them?

ViTAE-Transformer / ViTPose

Different Attention types and bottom-up configuration #105