OpenGVLab / Vision-RWKV

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
https://arxiv.org/abs/2403.02308
Apache License 2.0
346 stars 14 forks source link

About the T_MAX parameter #8

Closed stefenmax closed 2 months ago

stefenmax commented 5 months ago

Hi, thanks for your work. I want to train this model in my own dataset in npy format. The image size is 2562563. But the T is calculated to 16384 which that means I need to set a huge T_MAX number. Can you tell me why this happen? Thanks

BlinkDL commented 5 months ago

Hi it's related to the backward in RWKV cuda.

This can be solved with better code to achieve infctx (and const speed), stay tuned.

duanduanduanyuchen commented 5 months ago

@stefenmax Hi, maybe you can check if you resize the input image to a larger size or set a small patch size. The token number for a 256x256x3 input image should be 256 when patch_size is 16.