-
### What happened?
When we call Gemini models from Vertex, no entry of usage is updated in the transaction collection.
### Steps to Reproduce
1. Deploy Gemini pro and/or vision pro in Vertex AI.
2…
-
At present if you deploy something that ends up in `CrashLoopBackOff` FTL will wait forever. We need to be able to handle failed deployments without hanging.
-
[Bug]: Device with "gpu" name is not registered in the OpenVINO Runtime
Traceback (most recent call last):
File "/data/scratch/mkw-anomalib/anomalib-predict.py", line 27, in
inferencer = O…
-
### Question
hello. I apologize for asking a really fundamental question. I don't have a way to ask these questions around here.
I'm practicing the sagemaker deploy method on huggingface, and it say…
Me1e updated
8 months ago
-
### What is the issue?
I fine-tuned a sqlcoder model and generated a model file. When I deployed it on Ollama, there was a problem. The model could not run and the size of the Ollama file was incorre…
-
/kind feature
**Describe the solution you'd like**
Hope add [https://github.com/xorbitsai/inference](https://github.com/xorbitsai/inference) as the kserve huggingface LLMs serving runtime
Xor…
-
### Environment information
```plain text
System:
OS: macOS 11.7.10
CPU: (8) x64 Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
Memory: 1.14 GB / 16.00 GB
Shell: /bin/zsh
Binaries:
Node: 2…
-
### Self Checks
- [X] I have searched for existing issues [search for existing issues](https://github.com/langgenius/dify/issues), including closed ones.
- [X] I confirm that I am using English to su…
-
Given that your preditc module contains convert_to_onnx features and I need to use C++ for inference deployment, how do I do that?
-
I am running llama3 model on an rtx4090 with fp8 quantization. In the [result](https://github.com/NVIDIA/TensorRT-LLM/blob/main/cpp/include/tensorrt_llm/executor/executor.h#L323), `outputTokenIds` see…