-
### 🚀 The feature, motivation and pitch
The kv_cache is used only during the token phase, not during the prompt phase. As a result, the exported model currently works only with one of these phases, d…
-
**Is your feature request related to a problem? Please describe.**
When we try to use Gemini LLM we only have option to pass it as an `API KEY`, what if we have a json file through which we want to c…
-
"code": 429,
"message": "Quota exceeded for aiplatform.googleapis.com/online_prediction_requests_per_base_model with base model: anthropic-claude-3-5-sonnet. Please submit a quota increase re…
-
Hey all, I have a quick question, is onnxruntime-genai ([https://onnxruntime.ai/docs/genai/api/python.html](https://onnxruntime.ai/docs/genai/api/python.html)) supported in Triton Inference Server's O…
-
Here is the result of my command. Is this error inside the container or outside? The weird part to me is:
**genai-stack-pull-model-1 | pulling ollama model llama2 using http://llm-gpu:11434**
T…
-
I'm not sure if my issue is related to the issue [446](https://github.com/microsoft/onnxruntime-genai/issues/446) but here is what I experienced. The first time I load an ONNXRuntime-genai model into …
-
### Description of the bug:
```go
package main
import (
"context"
"fmt"
"log"
"github.com/google/generative-ai-go/genai"
"google.golang.org/api/…
-
In [phi-3 vision directml](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct-onnx-directml) using either python or c# certain questions just return ``
For example "Why is the sky blue?" r…
-
### Description of the bug:
Function calling does not work when providing `stop_sequences` and `stream=True`.
### Actual vs expected behavior:
Actual:
```python
import google.generativeai as ge…
-
### This issue is for a: (mark with an `x`)
```
- [X] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavio…