jperiodlangley commented 1 month ago

Description

Working on this now. Looking at how to incorporate llama-cpp-python into the RAG application for SpaceApps. It provides a docker image with server conforming to OpenAI API, so should be easy to connect with and use.

https://github.com/abetlen/llama-cpp-python

Complete the following items to get a presentation ready for Huntsville AI

[x] Description Completed
[x] Image Attached
[x] HSV.AI Event Created - hsv.ai
[x] Facebook Event Created - Facebook Group
[x] LinkedIn Event Created - Huntsville AI

Adding material to the presentations repository

Add the file to present (prefer Jupyter Notebooks or Markdown formated files) to the folder structure. For multiple files, create a directory following the naming convention and add the files to it.

Naming convention

We use a convention of starting the filenames with a date (year/month/day) so that the files are still sorted by date even when in alphabetical format.

YYMMDD_Session_Description.extension

jperiodlangley commented 3 weeks ago

Outline provided by perplexity.ai

I'm giving a talk this week about llama-cpp-python. What outline should I use?

Here is a suggested outline for your talk on llama-cpp-python:

Introduction

Brief overview of llama.cpp and its purpose
Importance of running large language models locally

Installation

Requirements: Python 3.8+, C compiler
Installing llama-cpp-python via pip
Installing with CUDA, Metal, BLAS support[1][2]

Loading the Model

Downloading model files from Hugging Face
Initializing the model using llama-cpp-python[1]
Setting context length and other parameters

Using the API

High-level Python API for text completion
OpenAI-like API
LangChain and LlamaIndex compatibility[2][3]

Performance Considerations

Importance of using the correct number of threads[4]
Comparing CPU vs GPU performance
Quantization and its impact on speed[1]

Examples and Use Cases

Running inference on sample prompts
Integrating llama-cpp-python with other libraries
Deploying models in standalone apps[5]

Conclusion

Recap of llama-cpp-python's key features
Future developments and roadmap
Resources for further learning

The outline covers the key aspects of llama-cpp-python - installation, model loading, API usage, performance tuning, and real-world examples. Feel free to adjust the depth based on your target audience and available time. The search results provide good supporting information for each section.

Citations: [1] Llama.cpp - Outlines 〰️ https://outlines-dev.github.io/outlines/reference/models/llamacpp/ [2] Python bindings for llama.cpp - GitHub https://github.com/abetlen/llama-cpp-python [3] How to Run LLaMA Locally on CPU or GPU - YouTube https://www.youtube.com/watch?v=SvjWDX2NqiM [4] Poor performance - How do I use LlamCpp in python correctly ... https://github.com/ggerganov/llama.cpp/discussions/6322 [5] Poll: Why do you choose llama.cpp over vLLM or vice-versa? - Reddit https://www.reddit.com/r/LocalLLaMA/comments/1bub23v/poll_why_do_you_choose_llamacpp_over_vllm_or/

jperiodlangley commented 3 weeks ago

llama-cpp-python