WEHI-RCPStudentInternship / student-intern-organiser

GNU General Public License v3.0
1 stars 6 forks source link

How can we search through all help documents using RAG LLM? #24

Open rowlandm opened 11 months ago

rowlandm commented 11 months ago

Might make it easier to find?

rowlandm commented 10 months ago

This includes searching across the FAQ, onboarding document in figshare, handbook in figshare, all the articles off the website etc.

rowlandm commented 10 months ago

Example of using a chat bot to help open source maintainers

https://livablesoftware.com/slack-chatbot-github-repositories/

rowlandm commented 10 months ago

Another example

https://github.com/derekvawdrey/website-to-chatbot

Would need to convert pdfs to text on website

rowlandm commented 1 month ago

Example of a LLM RAG

Another example LLM RAG

rowlandm commented 4 weeks ago

RAGged Edge Box: A Personal AI-Powered Document Search System

Details One of the most popular embodiments of Generative AI are information retrieval (IR) augmented generation (RAG). Such systems use an information retrieval engine (based on semantic embeddings or keyword search) and then use a Large Language Model (LLM) to extract answers to a given query.

These systems require a large amount of computation and are usually implemented in the cloud which presents data privacy issues.

In this talk we will present The RAGged Edge Box project in which basic embedding systems and small local LLMs are packaged inside a multi-platform virtual machine (VirtualBox). The system provides a Web interface that runs locally and allows access to the RAG functionality in a completely private manner. The neural networks run on a ONNX runtime and do not require a GPU. RAG code is implemented in PHP and is easy to modify, requiring a much smaller execution environment than a Python alternative.

https://textualization.com/ragged/

rowlandm commented 3 weeks ago

Internal WEHI post on RAG LLM

rowlandm commented 3 weeks ago

Be warned - LLMs don’t do formal reasoning - and that is a HUGE problem