-
see #27
https://ai.google.dev/gemma/docs?hl=en
https://www.kaggle.com/models/google/gemma
Gemma on Vertex AI Model garden
https://console.cloud.google.com/vertex-ai/publishers/google/model-gard…
-
I'm using a server with Ubuntu 20.04.6 LTS with a V100 GPU. I'm not admin, and I can't install cudatoolkit at system level. I installed pytorch (with conda), which uses its own cudatoolkit. I have no…
-
Just about to add this models to list.
-
quantized compiled using --> cargo build --example quantized -r --features metal
Unsure of... how many layers accelerated / how many threads used / clearly different sample stages
..yet I pres…
-
- [x] Use `llama_decode` instead of deprecated `llama_eval` in `Llama` class
- [ ] Implement batched inference support for `generate` and `create_completion` methods in `Llama` class
- [ ] Add suppo…
-
## Motivation
As seen on https://github.com/ggerganov/llama.cpp/issues/4216 , one of the important task is to refactor / clean up the server code so that it's easier to maintain. However, without a…
-
Does not start with the Llama 3.1 model. Is it possible to make changes to work with Llama 3.1? This is now the model with the most tokens and will potentially be used everywhere.
-
**LocalAI version:**
```
v1.25.0-cublas-cuda12-ffmpeg
```
**Environment, CPU architecture, OS, and Version:**
```
# uname -a
Linux localai-ix-chart-f8bbbb7c7-x6xx9 6.1.42-production+truen…
-
# Prerequisites
Please answer the following questions for yourself before submitting an issue.
- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of…
-
First problem is that port 443 is usually reserved. I edited index.js to 8080.
Next problem is that it crashes on first request:
```
/src/gpt-llama.cpp > npm start
> gpt-llama.cpp@0.1.9 star…