-
### What are you trying to do?
Every once in a while people will ask about how to get Ollama running on Google Colab, either for doing dev work inside of Colab or as a remote GPU. I think if the gi…
-
resources requested
```
resources:
cloud: azure
ports: 8080
accelerators: A10:1
region: westus2
```
able to provision instance but blocked at `INFO: Waiting for task resources on 1 nod…
-
Hi, I tried to use this new feature load_in_8bit=True to finetune gemma model on kaggle tpu. However, it showed the error as below. I'm wondering whether is a bug or it's a feature that will not suppo…
-
### Summary
Using the following command line to install Rust TLS plugin and ggml plugin.
```
curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- --p…
-
### System Info
tgi 1.4
### Information
- [X] Docker
- [ ] The CLI directly
### Tasks
- [X] An officially supported command
- [ ] My own modifications
### Reproduction
run TGI using google/gemm…
-
Chapter 3 explains the method to download the LLama.cpp model weights and tokens. This chapter specified 3B model. I have two question:
1- The available model weights and tokens are for models 7B and…
-
I would expect to use a single model to do different controlled generation tasks. However, when I create different controlled generation functions, I find that the underlying model object is over-ridd…
-
The SIMD acceleration on x86_64 does not seem to be as optimized as on AArch64. Perhaps some optimization work is needed for the x86_64 platform.
-
Hi, I'm trying to use MiniGemini outside of the demo environment, but am running into the following error when calling model.generate():
```
File "/app/backend/minigemini.py", line 104, in chat_…
-
I have tested Ollama on different machines yet, but no matter how many cores or RAM I have, it's only using 50% of the cores and just a very few GB of RAM.
For example now I'm running `ollama rum lla…