Open Followmeczx opened 3 months ago
Hi, the code for the ViT encoder is here: https://github.com/IDEA-Research/OSX/blob/118cf97fb1f144930bf93d88794b525d579b2d0c/main/transformer_utils/mmpose/models/backbones/vit.py#L176
Hello, may I ask how the pre trained model of the encoder was obtained during training?
hello. I don't really understand your approach to vit encoder. I didn't see specific building code for the ViT structure. Also, I want to add the normal image stitpled with the input image into the ViT encoder, so I need to change the number of channels from 3 to 9, I wonder where I can change the code? And I want to change the input image resolution from (256,192) to (256,256). This must involve some modification, can you give me some suggestions please!