HSV-AI / presentations

This repository is used to manage the presentations given at Huntsville AI meetups. It provides a collection of Issues, Cards, and Files to plan and create the content needed for a presentation.
17 stars 6 forks source link

240403 LLM Sherpa #95

Closed jperiodlangley closed 7 months ago

jperiodlangley commented 8 months ago

Description:

Continuing our discussion about Retrieval Augmented Generation (RAG), this week we will incorporate LLM Sherpa to provide chunks of text from PDF documents that have been retrieved from the NASA archive.

Our initial attempt used PyPDF2 to read text from the PDF documents. It was very slow and provided limited strings of text that did not match the paragraphs in the documents. We'll take a look back at what was available at the time, and then look through the LLM Sherpa API and see what it looks like with that piece incorporated.

As we get further into this project update, it has become apparent for the need to split the monolithic application into components that can be hosted and updated separately. We will go through what has been done so far to containerize both the ChromaDB vector database and the LLM Sherpa for chunking.

Complete the following items to get a presentation ready for Huntsville AI

Adding material to the presentations repository

Add the file to present (prefer Jupyter Notebooks or Markdown formated files) to the folder structure. For multiple files, create a directory following the naming convention and add the files to it.

Naming convention

We use a convention of starting the filenames with a date (year/month/day) so that the files are still sorted by date even when in alphabetical format.

YYMMDD_Session_Description.extension

jperiodlangley commented 8 months ago

LLM Sherpa

jperiodlangley commented 7 months ago

entities