OpenGVLab / Vision-RWKV

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
https://arxiv.org/abs/2403.02308
Apache License 2.0
285 stars 10 forks source link

About the Input size #26

Open Cynicarlos opened 1 week ago

Cynicarlos commented 1 week ago

If my input size is not always the same and that is not divisible by patch_size, what can I do to use VRVKV to do the image restoration tasks? Thanks!

duanduanduanyuchen commented 1 week ago

Hi, we usually padding or crop the input image to the same size(divisible by the patch size) to deal with such a problem.

Cynicarlos commented 1 week ago

Hi, we usually padding or crop the input image to the same size(divisible by the patch size) to deal with such a problem.

Thanks, but the image size of my training set and test set are different because the original image size is very large, I have to crop the image to a small size like 512*512 when training. So it seems that I cant't use VRWKV under this condition, it that right? Thanks again!!

duanduanduanyuchen commented 1 week ago

Hi, we usually padding or crop the input image to the same size(divisible by the patch size) to deal with such a problem.

Thanks, but the image size of my training set and test set are different because the original image size is very large, I have to crop the image to a small size like 512*512 when training. So it seems that I cant't use VRWKV under this condition, it that right? Thanks again!!

In such situation, I think you can also resize the image by interpolation or downsampling. VRWKV can handle different resolution inputs(by the interpolation in pos embedding). The output size will be the same as the input (H/patch_size, W/patch_size).

Cynicarlos commented 1 week ago

Hi, we usually padding or crop the input image to the same size(divisible by the patch size) to deal with such a problem.

Thanks, but the image size of my training set and test set are different because the original image size is very large, I have to crop the image to a small size like 512*512 when training. So it seems that I cant't use VRWKV under this condition, it that right? Thanks again!!

In such situation, I think you can also resize the image by interpolation or downsampling. VRWKV can handle different resolution inputs(by the interpolation in pos embedding). The output size will be the same as the input (H/patch_size, W/patch_size).

Thanks, but I want to use the same model with different image size, is that also ok? The parameterms of the model is fixed to the training image size, right?After I save the model when trainging with image size 512212, how can I use the saved model to test with large size image e.g. 10241024? That confused me alot, thank you!