-
# OPEA Inference Microservices Integration for LangChain
This RFC proposes the integration of OPEA inference microservices (from GenAIComps) into LangChain [extensible to other frameworks], enabli…
-
intelanalytics/ipex-llm-serving-cpu:latest
-
How should we host the wiki + API backend?
Choices I see right now:
- Coolify self-hosting on a Hetzner remote machine
- Render
- Vercel
- Fly.io
- Others?
We initially thought supabase cou…
-
/kind bug
**What steps did you take and what happened:**
[A clear and concise description of what the bug is.]
I installed Kserve in k8s following steps here https://kserve.github.io/website/late…
-
We have [recently announced](https://blog.langchain.dev/langgraph-platform-announce/) LangGraph Platform, a ***significantly*** enhanced solution for deploying agentic applications at scale.
We rec…
-
### Feature Description
AWS Bedrock has a few multimodal LLMs such as Claude Opus. It would be great if this can be added as a multi-modal-llm integration. There is already an anthropic multimodal …
-
### 🚀 The feature, motivation and pitch
There are huge potential in more advanced load balancing strategies tailored for the unique characteristics of AI inference, compared to basic strategies such …
-
```
from ipex_llm import optimize_model
from transformers import LlavaForConditionalGeneration
model = LlavaForConditionalGeneration.from_pretrained('llava-hf/llava-1.5-7b-hf', device_map="cpu")
m…
-
### Anything you want to discuss about vllm.
Can vLLM consider support multiple models on the same vLLM instance?
We are evaluating using vLLM for large scale LLM inference serving. But we are con…
-
**Setup**
Machine: AWS Sagemaker ml.p4d.24xlarge
Model: https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1
Used Docker container image with the latest build of trt-llm (`0.8.0.dev2024011…