-
### MediaPipe Solution (you are using)
MediaPipe LLM Inference API
### Programming language
TBD
### Are you willing to contribute it
Yes
### Describe the feature and the current beha…
-
### What happened?
I've already quantized a 2b variant of this model, and one of its instruct fine tune, on a subset of the same data (the first 1000 samples are the same in the same order -- the e…
-
The Semantic Answer Similarity (SAS) metric (https://arxiv.org/abs/2108.06130) employs pretrained encoders to gauge the semantic similarity between two types of texts: predictions and references. This…
-
### What is the issue?
Upon running "ollama run gemma:2b" (though this happens for all tested models: llama3, phi, tinyllama), the loading animation appears and after ~5 minutes (estimate, untimed)…
-
Heya,
So I thought this would be an interesting project to leverage the [new built-in AI capabilities in Chrome](https://developer.chrome.com/docs/ai/built-in), so I forked the repo and started tin…
-
### Issue description
Electron sample app crash on Mac
### Expected Behavior
no crash
### Actual Behavior
crash
### Steps to reproduce
Im testing with template https://github.com/…
-
### When I run the following script
```
import torch
from accelerate import Accelerator, PartialState
from peft import LoraConfig
from tqdm import tqdm
from transformers import AutoTokenizer, …
-
I was trying to integrate the model into our iOS app via cocoapods and Bazel. Building the app in iOS simulator with XCode 14.1, iphone 14 pro simulator. The building and compiling worked without any …
-
Does someone know what makes Gemma 2 9b based models run so slow (locally) compared to llama 3 8b? Sure it's bigger but the output speed difference is huge. Is it how it is or is there a known issue w…
-
### System Info
- `transformers` version: 4.44.0
- Platform: Linux-6.5.0-44-generic-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface_hub version: 0.24.5
- Safetensors version: 0.4.…