OpenGVLab / Vision-RWKV

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
https://arxiv.org/abs/2403.02308
Apache License 2.0
371 stars 14 forks source link

About the Input size #26

Closed Cynicarlos closed 2 weeks ago

Cynicarlos commented 5 months ago

If my input size is not always the same and that is not divisible by patch_size, what can I do to use VRVKV to do the image restoration tasks? Thanks!

duanduanduanyuchen commented 5 months ago

Hi, we usually padding or crop the input image to the same size(divisible by the patch size) to deal with such a problem.

Cynicarlos commented 5 months ago

Hi, we usually padding or crop the input image to the same size(divisible by the patch size) to deal with such a problem.

Thanks, but the image size of my training set and test set are different because the original image size is very large, I have to crop the image to a small size like 512*512 when training. So it seems that I cant't use VRWKV under this condition, it that right? Thanks again!!

duanduanduanyuchen commented 5 months ago

Hi, we usually padding or crop the input image to the same size(divisible by the patch size) to deal with such a problem.

Thanks, but the image size of my training set and test set are different because the original image size is very large, I have to crop the image to a small size like 512*512 when training. So it seems that I cant't use VRWKV under this condition, it that right? Thanks again!!

In such situation, I think you can also resize the image by interpolation or downsampling. VRWKV can handle different resolution inputs(by the interpolation in pos embedding). The output size will be the same as the input (H/patch_size, W/patch_size).

Cynicarlos commented 5 months ago

Hi, we usually padding or crop the input image to the same size(divisible by the patch size) to deal with such a problem.

Thanks, but the image size of my training set and test set are different because the original image size is very large, I have to crop the image to a small size like 512*512 when training. So it seems that I cant't use VRWKV under this condition, it that right? Thanks again!!

In such situation, I think you can also resize the image by interpolation or downsampling. VRWKV can handle different resolution inputs(by the interpolation in pos embedding). The output size will be the same as the input (H/patch_size, W/patch_size).

Thanks, but I want to use the same model with different image size, is that also ok? The parameterms of the model is fixed to the training image size, right?After I save the model when trainging with image size 512212, how can I use the saved model to test with large size image e.g. 10241024? That confused me alot, thank you!

duanduanduanyuchen commented 4 months ago

Hi, we usually padding or crop the input image to the same size(divisible by the patch size) to deal with such a problem.

Thanks, but the image size of my training set and test set are different because the original image size is very large, I have to crop the image to a small size like 512*512 when training. So it seems that I cant't use VRWKV under this condition, it that right? Thanks again!!

In such situation, I think you can also resize the image by interpolation or downsampling. VRWKV can handle different resolution inputs(by the interpolation in pos embedding). The output size will be the same as the input (H/patch_size, W/patch_size).

Thanks, but I want to use the same model with different image size, is that also ok? The parameterms of the model is fixed to the training image size, right?After I save the model when trainging with image size 512_212, how can I use the saved model to test with large size image e.g. 1024_1024? That confused me alot, thank you!

You can test the model trained on 512512 images on 10241024 ones without extra process. The resize_pos_embed will resize the shape of the positional embedding automatically.

For the modification of the config file, as an example, you can add the following code to vrwkv_tiny_8xb128_in1k.py .

test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='Resize',
        size=(1170, -1),
        backend='pillow',
        interpolation='bicubic'),
    dict(type='CenterCrop', crop_size=1024),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='ImageToTensor', keys=['img']),
    dict(type='Collect', keys=['img'])
]

This means the test image (imagenet val set here, you can use your customer dataset as well by custom a dataset config file, see openmmlab doc for more detail) will be first resized to a height of 1170 and then center-cropped to a resolution of 1024*1024.