-
Hugginface hub login successful
Used gemma2-27b LLM to testing:
cargo run --release -- -m "google/gemma-2-27b-it" -c
Finished release [optimized] target(s) in 0.03s
Running `target/re…
-
### Describe the bug
After lauched model , respone repeated until max tokens
Example: when I ask 'hello' it respones 'HelloHowToToToToToToToToToToToToToToToToToToToToToToToToToToToToToToToToToToToTo…
-
https://huggingface.co/mistralai/Ministral-8B-Instruct-2410
-
My notebook:
Windows 11 Pro 23H2
Intel i7-8750H
GeForce GTX 1050Ti (Mobile)
32GB RAM (2666GHz)
After I removed the mention of flash_atn in gemma.py, I got the following errors:
`TypeError: Gem…
-
### Describe the bug
I have been having challenges with Autogen framework for quite sometime now, I have followed YouTube tutorials, Autogen documentations, done the necessary installations and still…
-
Hi 👋 ,
It would be really great if you could add support for the Gemma model series (i.e. 2B and 7B variants, particularly the 7B is what I would like most), since I see that it is currently not su…
-
We want model parallelism to be easy to use across the library. At a high level, a user should express their hardware, and (possibly) desired model parallel vs data parallel split for the device grid.…
-
Achieving Faster Open-Source Llama3 Serving with SGLang Runtime (vs. TensorRT-LLM, vLLM)
https://lmsys.org/blog/2024-07-25-sglang-llama3/
gemma 2 update
https://huggingface.co/google/gemma-2-2b
…
-
### 🚀 The feature, motivation and pitch
Thanks for fixing the soft-capping issue of the Gemma 2 models in the last release! I noticed there's still a [comment](https://github.com/vllm-project/vllm/bl…
-
### Bug description
It seems that they updated the Gemma v1 2B weights. Something to look into:
```
⚡ main ~/litgpt litgpt chat checkpoints/google/gemma-2b
{'access_token': None,
'checkpoint_…
rasbt updated
3 months ago