IDEA-Research / OSX

[CVPR 2023] Official implementation of the paper "One-Stage 3D Whole-Body Mesh Recovery with Component Aware Transformer"
https://osx-ubody.github.io/
MIT License
606 stars 52 forks source link

About ViTencoder setting #124

Open Followmeczx opened 3 months ago

Followmeczx commented 3 months ago

image hello. I don't really understand your approach to vit encoder. I didn't see specific building code for the ViT structure. Also, I want to add the normal image stitpled with the input image into the ViT encoder, so I need to change the number of channels from 3 to 9, I wonder where I can change the code? And I want to change the input image resolution from (256,192) to (256,256). This must involve some modification, can you give me some suggestions please!

linjing7 commented 2 months ago

Hi, the code for the ViT encoder is here: https://github.com/IDEA-Research/OSX/blob/118cf97fb1f144930bf93d88794b525d579b2d0c/main/transformer_utils/mmpose/models/backbones/vit.py#L176

wangjunkaiyangzilong commented 2 weeks ago

Hello, may I ask how the pre trained model of the encoder was obtained during training?