-
I use mergekit-moe to generate MOE model with several same Gemma models(mode hidden), but the result model output meaningless results like in https://github.com/arcee-ai/mergekit/issues/218#issuecomme…
-
https://storage.googleapis.com/deepmind-media/gemma/gemma-2-report.pdf
https://openai.com/index/finding-gpt4s-mistakes-with-gpt-4/
https://github.com/jeasonstudio/chrome-ai
https://ai.meta.com/rese…
-
### Summary
[You]:
Write me a short description of the Adidas brand
[Bot]:
^C
### Reproduction steps
alex@M1 ~ % bash
-
### What is the issue?
```
% ollama show gemma:7b-instruct-fp16
Model
arch gemma
parameters 9B
quantizati…
-
Hello, I have tried your method on gemma-7b model. I found that this method is work on gsm-8k dataset, but this fails on wikitext-2 dataset. This is my training log:
```
[WARNING|logging.py:329] 2…
-
Thank you for your excellent work. I'd like to know how to reproduce jbloom/Gemma-2b-Residual-Stream-SAEs
Can you open source the configuration file for training it?
-
Hi
can you also implement gemma model to compare with llama
best regards
-
### System Info
Running a TGI 2.0.3 docker on a 8 NVIDIA_L4 VM with 1 L4 exposed to docker.
Command:
```sh
MODEL=google/gemma-2b-it
docker run \
-m 320G \
--shm-size=40G \
-e NVIDIA_VI…
-
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/gemma-2-9b-bnb-4bit",
max_seq_length = max_seq_length,
dtype = None,
load_in_4bit = True)
I noticed models ar…
-
Consider the following models:
- [x] LLaMa 3 8B
- [x] LLaMa 2 7B
- [x] OLMo 7B
- [x] OLMo 7B Instruction tuned
- [x] Gemma 7B
- [x] Gemma 7B instruction tuned
- [x] Aya 23 8B