-
Context window size is largely manual right now – it can be specified via `{"options": {"num_ctx": 32768}}` in the API or via `PARAMETER num_ctx 32768` in the Modelfile. Otherwise the default value is…
-
I used [llm_inference](https://github.com/googlesamples/mediapipe/tree/main/examples/llm_inference) sample with `gemma-2b-it-cpu-int4.bin` on Pixel 8 Pro emulator.
The prefill speed seems to be in…
-
### System Info
python version: 3.11.9
transformers version: 4.44.2
accelerate version: 0.33.0
torch version: 2.4.0+cu121
### Who can help?
@gante
### Information
- [X] The official example sc…
-
**Issue: Model Error when Setting max_seq_length > 8192**
**Description:**
The `unsloth/codegemma-2b-bnb-4bit` model throws an error when attempting to set `max_seq_length` greater than 8192.
…
-
I found that the current version of LongLM can not load Gemma 1 or Gemma 2 model successfully. I wrote a minimum test to help reproduce the issue:
```python
# transfromers version 4.38.2
# this exa…
-
### What happened?
I'm encountering an issue with the autogen library (version 0.3.1) when using OpenAI as the LLM provider (version 1.52.2). The error occurs during the generation of responses with …
-
- [ ] [vidore/colpali · Hugging Face](https://huggingface.co/vidore/colpali)
# ColPali: Visual Retriever based on PaliGemma-3B with ColBERT strategy
## Model Description
This model is built iterati…
-
### Have I written custom code (as opposed to using a stock example script provided in MediaPipe)
None
### OS Platform and Distribution
Windows 11, Chrome V130
### Mobile device if the issue happe…
-
As I understand, its quite straightforward to load a 4-bit quantized model with `litgpt serve` through CLI using:
`litgpt serve google/gemma-2-2b-it --quantize bnb.nf4-dq`
However, is there a way …
-
Hello everyone,
I'm excited to be using ONNX Runtime GenAI. It's an amazing library for anyone looking to run models on their device. I've been learning how to use ONNX GenAI by following various t…