carverauto / threadr

šŸŒŽ OSS Real-time AI Data Analysis with GraphDB integration. šŸ”
Apache License 2.0
17 stars 1 forks source link

feat: Add vLLM to k8s environment #67

Open mfreeman451 opened 7 months ago

mfreeman451 commented 7 months ago

What

We need to use local open models to do inference on conversations to build relationships since just relying on regular expressions alone will not produce accurate results.

A user in our community has mentioned we should not try and use ollama/llama.cpp and checkout this instead:

Why

We need to be able to feed conversations in, probably on a schedule, perhaps more frequently, and use it to analyze and identify relationships between users in the logs. By feed in conversations, this would most likely just be a query against our existing data set in neo4j. This basically fits into the ETL pipeline.

How

Install vLLM in cluster, deploying with kserve??

https://kserve.github.io/website/latest/modelserving/v1beta1/llm/vllm/ https://docs.vllm.ai/en/latest/index.html https://docs.vllm.ai/en/latest/serving/deploying_with_kserve.html

API

We'll need to build a gated API around this unless they have their own system. I'm not sure exactly what the inputs will be yet, probably a system_prompt and a query. We'll most likely have to build langchain in with the API so it can perform function calling, otherwise it will just spit back a bunch of junk to us every time that we cant rely on programattically.

Usage

We might even consider using mage.ai here on the sort of processing piece once the vLLM stuff is setup. Since local inference is usually a bit slower, we'll probably do most of our work in batch jobs.

Extra Links

https://docs.vllm.ai/en/latest/serving/serving_with_langchain.html https://python.langchain.com/docs/integrations/llms/vllm/