-
Acties:
@rbruijnshkv:
- [ ] Aanpassen regiona(a)le modellen met alle in- en uitlaten naar/vanaf een boundary
- [ ] Koppeling cross-checken met eerdere inventarisaties/samenwerken-met-kunstwerken/w…
-
### 🐛 Describe the bug
Script
~~~
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
import os
os.environ["TOKENIZERS_PARALLELISM"] = "false" # To prevent long warnin…
-
I was runing benchmark examples with vllm backend by the following scripts.
```python
from optimum_benchmark import Benchmark, BenchmarkConfig, TorchrunConfig, InferenceConfig, PyTorchConfig, VLLMCo…
-
Hello,
I have obtained access to the llama-2 models from Meta, but we do not have enough GPU resources to do the fine-tuning. Can you please share your fine-tuned llama-2-7b model weights with us? …
-
### What is the issue?
I get a CUDA out of memory error when sending large prompt (about 20k+ tokens) to Phi-3 Mini 128k model on laptop with Nvidia A2000 4GB RAM. At first about 3.3GB GPU RAM and …
-
I tried two gguf conversion on M2 ultra (metal) but no luck. I converted them myself and still the same error.
Here is the first model I tried:
https://huggingface.co/guinmoon/MobileVLM-1.7B-GGUF…
-
### What is the issue?
Hi,
I noticed previous out of memory error fix at version 0.1.45-rc3. [https://github.com/ollama/ollama/issues/5113].
```
ollama run deepseek-coder-v2
```
Now I…
-
I was trying this model https://huggingface.co/ddh0/Meta-Llama-3-8B-Instruct-bf16-GGUF
By varying the prompt it randomly works sometimes. When offloading layers to the GPU it seems to crash no matt…
-
Hi,
I received a weird error in a workflow that was previously working. Not sure if the "TODO: fix" mention is intended for comfyui, for llama_cpp, or for VLM nodes, but I figured I'd start here :). …
-
I was testing the new quantize by @angeloskath with some Italian prompts that were failing with previous version and now are PERFECT! But while doing this I have seen extreme slowness with q8 and fp16…