VectorSpaceLab / OmniGen

OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340
MIT License
2.9k stars 230 forks source link

Do you think this model could runn iwth llama.cpp? the orgianl phy mini model works with llama.cpp #10

Open Manni1000 opened 1 month ago

BitPhinix commented 1 month ago

Omnigen requires the sdxl vae and some custom causal mask / positional embeds (along with a custom denoising loop). So, out of the box, no, you'd need to open a pr and implement that yourself if the llama.cpp authors are open for that. More a question for that that repo I guess.

eoffermann commented 1 month ago

The answer to this is "no" at least for the llama-server component which has multimodal support disabled. The most powerful features wouldn't be available. People are pretty annoyed over on that repo - so while I don't think it's a huge priority for the developers to have multimodal support in llama server, I'd be surprised if it didn't get fixed.

Otherwise, it's all just software. If it's possible to run OmniGen at all (and it is), it would be possible to support it in llama.cpp. The questions are how much work will it be, how good a fit do the developers think it is, and will someone want to do it? I'd like to think we'll see more support for architectures like OmniGen in it, but I don't have a crystal ball.