Any idea for Combine origin rwkv and vision rwkv into one structure like Clips or Blips ?

OpenGVLab / Vision-RWKV

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

https://arxiv.org/abs/2403.02308

Apache License 2.0

371 stars 14 forks source link

Any idea for Combine origin rwkv and vision rwkv into one structure like Clips or Blips ? #14

Open structure-charger opened 7 months ago

structure-charger commented 7 months ago

As the title, and beyond the title, is there any way to implementation rwkv llava, minicpm-v, internml-composer-v or qwen-v ?

BlinkDL commented 7 months ago

check https://github.com/howard-hou/VisualRWKV

duanduanduanyuchen commented 6 months ago

Hi! Thanks for your advice. We didn't have the plan for these models yet. As VRWKV is a visual backbone like ViT, I think it can be applied to these architectures similarly.