RWKV.cpp is a solution allowing to use RWKV models, which are LLMs based on a RNN-architecture. This allows the models to scale linearly in memory use according to the input size and most of all performs operations in a sequential manner, making them very CPU-friendly. The problem arising with RNN-based LLMs was initially that they could not match their transformers-based equivalent initially. But the RWKV models (and the team behind them) managed to solve this: the latest version of the RWKV family of models has very good performance and is on par with similar-sized transformers-based LLMs.
The idea of this issue is therefore to implement a wrapper around RWKV.cpp in order to allow the LLM prompt-optimization to be carried by RWKV models. These should run on the CPU RAM, therefore leaving all the VRAM for the diffusion model. This allows for two benefits:
The VRAM footprint is reduced comparing to a configuration where both LLM and Diffuser are loaded in the GPU.
This allows to get rid of the (paid) API-based LLM and allow to run all the operations related to the application locally.
RWKV.cpp is a solution allowing to use RWKV models, which are LLMs based on a RNN-architecture. This allows the models to scale linearly in memory use according to the input size and most of all performs operations in a sequential manner, making them very CPU-friendly. The problem arising with RNN-based LLMs was initially that they could not match their transformers-based equivalent initially. But the RWKV models (and the team behind them) managed to solve this: the latest version of the RWKV family of models has very good performance and is on par with similar-sized transformers-based LLMs.
The idea of this issue is therefore to implement a wrapper around
RWKV.cpp
in order to allow the LLM prompt-optimization to be carried by RWKV models. These should run on the CPU RAM, therefore leaving all the VRAM for the diffusion model. This allows for two benefits: