inference-api Search Results

1000+ results
for inference-api

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

ray-project/kuberay #2323

[RFC] Introduce new API-RayCluster Fleet and ReplicaSet in K…

### Search before asking - [X] I had searched in the [issues](https://github.com/ray-project/kuberay/issues) and found no similar feature requirement. /cc Bytedancer @Basasuya @Yicheng-Lu-llll …

Jeffwan updated 1 week ago
3
gpt-omni/mini-omni #6

Apple silicon support

Hey, this project seems really interesting. Currently hardly there any competitor to chatGPT advanced voice mode, but this seems to be in the same direction. Currently device being used is cuda, ca…

bhupesh-sf updated 1 hour ago
12
elastic/elasticsearch #108260

[ML] Inference api cohere rerank unused fields

### Description The cohere rerank implementation allows configuring fields that probably don't apply. The implementation leverages the common settings here: https://github.com/elastic/elasticsearch/b…

jonathan-buttner updated 2 months ago
2
elastic/elasticsearch #111898

Support text_generation models on Hugging Face

### Description Customer is interested in using Elasticsearch inference API with text generation models on Hugging Face where as of 8.15 we are limited supporting only text_embeddings

alaa-mallah updated 3 weeks ago
1
sherlock-audit/2024-06-allora-judging #131

0x416 - Lack of error handling when making blockless api cal…

0x416 Medium # Lack of error handling when making blockless api call ## Summary Lack of error handling when making blockless api call ## Vulnerability Detail Error handling when making blockless…

sherlock-admin2 updated 1 week ago
2
vllm-project/vllm #7778

[Feature]: High throughput has not been achieved in decoding…

### 🚀 The feature, motivation and pitch I launched a LLM service by vllm, and I use AsyncOpenAI function for high throughput output. like this: ` async def async_llm_infer_sampling(prompt, a…

Liucd0520 updated 4 days ago
1
Dan-wanna-M/formatron #3

Efficient batched inference

While we support batched inference like other constrained decoding libraries, the current implementation can be parallelized further. In particular, we can mask logits in batch and run several `kbnf` …

Dan-wanna-M updated 1 week ago
5
oneapi-src/oneDNN #1788

GEMM API for efficient LLM inference with W8A16

I want to perform inference on quantized LLAMA (W8A16) on ARM-v9 (with SVE) using oneDNN. The LLAMA weights are per-group quantized. Based on my understanding, I need to prepack the weights to redu…

oleotiger updated 1 month ago
3
kubernetes-sigs/lws #203

Add group size as an env var by default

**What would you like to be added**: Add the group size as an env var **Why is this needed**: In most cases for multi-host inference, the size is needed, like in vllm. Suggest to use LWS_G…

ahg-g updated 6 hours ago
6
openvinotoolkit/openvino #26264

[Performance]: inference takes too long on simple tasks

### OpenVINO Version 2021.2.1.0 ### Operating System Windows System ### Device used for inference CPU ### OpenVINO installation Build from source ### Programming Language C++ ### Hardware Ar…

xueyingxin updated 5 days ago
1

上一页 1...5 6 7 8 9 10 11...100 下一页

1000+ results for inference-api

1000+ results
for inference-api