Closed stefenmax closed 2 months ago
Hi it's related to the backward in RWKV cuda.
This can be solved with better code to achieve infctx (and const speed), stay tuned.
@stefenmax Hi, maybe you can check if you resize the input image to a larger size or set a small patch size. The token number for a 256x256x3 input image should be 256 when patch_size is 16.
Hi, thanks for your work. I want to train this model in my own dataset in npy format. The image size is 2562563. But the T is calculated to 16384 which that means I need to set a huge T_MAX number. Can you tell me why this happen? Thanks