We need to use local open models to do inference on conversations to build relationships since just relying on regular expressions alone will not produce accurate results.
A user in our community has mentioned we should not try and use ollama/llama.cpp and checkout this instead:
Why
We need to be able to feed conversations in, probably on a schedule, perhaps more frequently, and use it to analyze and identify relationships between users in the logs. By feed in conversations, this would most likely just be a query against our existing data set in neo4j. This basically fits into the ETL pipeline.
We'll need to build a gated API around this unless they have their own system. I'm not sure exactly what the inputs will be yet, probably a system_prompt and a query. We'll most likely have to build langchain in with the API so it can perform function calling, otherwise it will just spit back a bunch of junk to us every time that we cant rely on programattically.
Usage
We might even consider using mage.ai here on the sort of processing piece once the vLLM stuff is setup. Since local inference is usually a bit slower, we'll probably do most of our work in batch jobs.
What
We need to use local open models to do inference on conversations to build relationships since just relying on regular expressions alone will not produce accurate results.
A user in our community has mentioned we should not try and use ollama/llama.cpp and checkout this instead:
Why
We need to be able to feed conversations in, probably on a schedule, perhaps more frequently, and use it to analyze and identify relationships between users in the logs. By feed in conversations, this would most likely just be a query against our existing data set in neo4j. This basically fits into the ETL pipeline.
How
Install vLLM in cluster, deploying with kserve??
https://kserve.github.io/website/latest/modelserving/v1beta1/llm/vllm/ https://docs.vllm.ai/en/latest/index.html https://docs.vllm.ai/en/latest/serving/deploying_with_kserve.html
API
We'll need to build a gated API around this unless they have their own system. I'm not sure exactly what the inputs will be yet, probably a
system_prompt
and aquery
. We'll most likely have to build langchain in with the API so it can perform function calling, otherwise it will just spit back a bunch of junk to us every time that we cant rely on programattically.Usage
We might even consider using mage.ai here on the sort of processing piece once the vLLM stuff is setup. Since local inference is usually a bit slower, we'll probably do most of our work in batch jobs.
Extra Links
https://docs.vllm.ai/en/latest/serving/serving_with_langchain.html https://python.langchain.com/docs/integrations/llms/vllm/