Open Manni1000 opened 1 month ago
The answer to this is "no" at least for the llama-server component which has multimodal support disabled. The most powerful features wouldn't be available. People are pretty annoyed over on that repo - so while I don't think it's a huge priority for the developers to have multimodal support in llama server, I'd be surprised if it didn't get fixed.
Otherwise, it's all just software. If it's possible to run OmniGen at all (and it is), it would be possible to support it in llama.cpp. The questions are how much work will it be, how good a fit do the developers think it is, and will someone want to do it? I'd like to think we'll see more support for architectures like OmniGen in it, but I don't have a crystal ball.
Omnigen requires the sdxl vae and some custom causal mask / positional embeds (along with a custom denoising loop). So, out of the box, no, you'd need to open a pr and implement that yourself if the llama.cpp authors are open for that. More a question for that that repo I guess.