-
### Problem Statement
I can see why Gab was confused here: #1329
@vansangpfiev
Can we use the groupings @dan-homebrew /I orginally suggested? 🙏
Current
- Not super accurate bc `chat` shou…
-
# Ideological Inference Engines
Description placeholder
[https://paulbricman.com/hypothesis-subspace/?stackedPages=%2Fideological-inference-engines](https://paulbricman.com/hypothesis-subspace/?stac…
-
Right now, we support inference engines like vllm for inference, what if people want calling OpenAPIs like chatGPT, it's easy to integrate.
-
Hello everyone, I have a problem and would like to ask for help. After I compile and run the inference code run.py, if I set max_output_len to a small value, the output will be truncated before it is …
-
- 3,000 and 1 bugs...
- Using CLANG is impossible because it directly brings up any infinity of bugs, under which version is that supposed to work without bugs? Because I'm using 18.1.3 on Ubuntu 2…
-
## Goal
- Cortex has a clear CLI and API to select active hardware
- Cortex can list all available hardware
- Cortex can activate specific hardware (e.g. CPU-only, or specific GPU)
- Cortex can dete…
-
## Goal
- We need a Python runtime to run TTS library
- There is an ever-increasing focus on Python for inference - how do we align?
- Blocks #1247
## Tasklist
- [ ] Architecture question: how do…
-
Problem
Currently it is more complicated handling sse output from custom remote inference engines that do not follow the OpenAI spec
Success Criteria
Ability to transform the response similar to …
-
/kind feature
**Describe the solution you'd like**
Hope add [https://github.com/xorbitsai/inference](https://github.com/xorbitsai/inference) as the kserve huggingface LLMs serving runtime
Xor…
-
Trying to run offline retinanet in a container with one Nvidia GPU:
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev --model=retinanet --implementation=nvidia …