-
The Llama3 shared codebase demo currently handles prefill input prep, looped prefill, decode input prep, decode trace capture, and decode trace execution.
The Llama3 demo should be refactored to use …
-
Bring up Llama 3.2 model family on Wormhole, T3K and TG
-
### What is the issue?
multi gpu
ollama run llama3.2-vision
>>> The image is a book cover. Output should be in this format - : . Do not output anything else /media/root/ssd2t/data/pro/tmp/o
... l/…
-
### Feature request
It would be nice if when setting `torch_dtype` to `auto` when calling `from_pretrained`, it properly respects _nested_ `torch_dtype`s specified in the model's config. Right now i…
-
### What is the issue?
![screenshot-ollama](https://github.com/user-attachments/assets/bc208fb6-34b7-4ac3-a19f-b7adfacdf269)
Disclaimer: I have no GPU (Integrated Graphics)
### OS
Linux
### GPU
…
-
Hi,
Is it possible that this project could be updated to support ollama 0.4.0. I want to try the new LLama Vision model but to run those models you need atleast version 0.4.0.
Thanks!
-
Hi,
I'm trying to constrain the generation of my VLMs using this repo; however i can't figure out the way to personalize the pipeline for handling inputs (query+image). Whereas it is documented as …
-
I saw you used something like this:
```
model = FastVisionModel.get_peft_model(
model,
finetune_vision_layers = True, # False if not finetuning vision part
finetune_language_lay…
-
Hello,
I am tryoing to SFT train llama3.2 11B vision instruct model. on a dataset that answer a question on an image using a context (could be more than one image). My code is:
```
def format_dat…
-
## Describe the bug
```bash
cargo run --features metal --package mistralrs-server --bin mistralrs-server -- --token-source cache -i plain -m microsoft/Phi-3.5-mini-instruct -a phi3 --dtype bf16
``…