-
@danielhanchen
In the unsloth Gemma intro [blogpost](https://unsloth.ai/blog/gemma), you mention VRAM increase due to larger `MLP` size in `Gemma` compared to `Llama` and `Mistral`, and show a [gr…
-
### Operating System
MacOS
### Version Information
not relevant
### Steps to reproduce
https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/mistral/litellm.ipynb
@san…
-
Please let us know what model architectures you would like to be added!
**Up to date todo list below. Please feel free to contribute any model, a PR without device mapping, ISQ, etc. will still be …
-
### What is the issue?
Currently Ollama can [import GGUF files](https://github.com/ollama/ollama/blob/main/docs/import.md). However, larger models are sometimes split into separate files. Ollama shou…
-
I tried to load a T5 model but it seems not supported.
```
---------------------------------------------------------------------------
NotImplementedError Traceback (most re…
-
I'm trying to load Mistral 7B 32K. I've chunked the 4.3GB model and uploaded it to huggingface.
When the download is seemingly complete, there is a warning about being out of memory:
It's a …
-
Parent issue to track new models/endpoints/providers to add to litellm, comment below for new ones
- [x] Vertex AI Mistral - https://github.com/BerriAI/litellm/issues/4874
- [x] Vertex AI Codestr…
-
So I finetuned a model using a custom dataset. The output should be in JSON format. All the keys are the same for each output, i.e. structure of the response JSON is the same while values need to be e…
-
### Python Version
```shell
python 3.10.9
```
### Pip Freeze
```shell
annotated-types==0.7.0
anyio==4.4.0
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
asttokens @ f…
-
In Hugging Face "eager" Mistral implementation, a sliding window of size 2048 will mask 2049 tokens. This is also true for flash attention. In the current vLLM implementation a window of 2048 will mas…