Explore RWKV-LM - Githubissues

OpenAdaptAI / OpenAdapt

Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models

https://www.OpenAdapt.AI

MIT License

961 stars 132 forks source link

Explore RWKV-LM #37

Open abrichr opened 1 year ago

abrichr commented 1 year ago

How can we use RWKV-LMM to implement "infinite" context lengths?

From https://github.com/BlinkDL/RWKV-LM:

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

Additional reading:

From Deep to Long Learning?

Hyena Hierarchy: Towards Larger Convolutional Language Models

dianzrong commented 1 year ago

RWKV-LM can be used throughout the project to:

summarize texts associated with the screenshots
better suggest prompts (like google's autofill)
better understand prompts from the user
in the future: handle translations

Implementation:

can get "infinite" context lengths through iterative refinement
- states would be the recording up until that point and the token would be the next screenshot

Limitation:

this process can be computationally expensive

angelala3252 commented 1 year ago

As Dian mentioned above, RWKV-LM could be helpful for us since it could analyze large amounts of past user actions to better predict future actions. It could also learn how to do more complex tasks that the user themselves don't know how to do, as it could learn from longer texts like instruction manuals. These benefits would also help with efficiency and speed up the automation.

As for implementation, we could consider combining RWKV-LM with another tool that could help with the graphical side of things. For example, we could have RWKV_LM understand the task we wish to execute and lay out the steps needed, then using something like OCR or MiniGPT-4, we could handle how to execute the steps on the GUI.