VectorSpaceLab / OmniGen

OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340
MIT License
2.83k stars 219 forks source link

Can I replace phi3 llm? #46

Open win10ogod opened 3 weeks ago

win10ogod commented 3 weeks ago

Can I replace phi3 llm? For example, use qwen2 or llama3.2 instead

able2608 commented 3 weeks ago

I believe that you'll need to retrain the model from scratch. The paper seems to imply that they initialized the parameters of the backbone with phi-3 and further trained it on their data with frozen VAE and text encoding. Furthermore, the attention mechanism for pure text LLMs might need to be modified for it to work with image tokens. BTW this is the figure illustrating their approach, you might want to check it out: 圖片

staoxiao commented 3 weeks ago

As @able2608 mentioned, you can replace this LLM, but it requires retraining.

felixfuu commented 3 weeks ago

@staoxiao How long does it take to train for different stages(on 104 A800)?