LargeWorldModel / LWM

Apache License 2.0
7.05k stars 546 forks source link

Great work! Any plan to train a smaller version, e.g. around 3B? #34

Open StarCycle opened 6 months ago

StarCycle commented 6 months ago

Hello,

It's a really great work which contributes a lot to the community!

Do you have any plan to train a smaller version of large world model (e.g., 1~3B), which may be based on smaller models like Phi-2? It should be much easier and use less computing resources.

StarCycle commented 6 months ago

If other researchers have such plan, please reply and we may work together!

wilson1yan commented 5 months ago

Thanks for your interest. We don't have plans to train a smaller model at the moment

befman123 commented 3 months ago

@StarCycle This is an amazing project but, I'm just going to try to load it 8bit (i don't even know if it will work). I have a 4070ti, never loads f16 let alone 32 for 7B models. If there was a way for the community to pinch in and help you guys to do the training on tinyllama or phi3 it would be awesome. I have no idea how much it would cost, I don't think it's cheap or affordable. If it's any of the two I'm jumping in.

StarCycle commented 3 months ago

@StarCycle This is an amazing project but, I'm just going to try to load it 8bit (i don't even know if it will work). I have a 4070ti, never loads f16 let alone 32 for 7B models. If there was a way for the community to pinch in and help you guys to do the training on tinyllama or phi3 it would be awesome. I have no idea how much it would cost, I don't think it's cheap or affordable. If it's any of the two I'm jumping in.

Hi @befman123, I tried to generate video with LWM. It needs a quite large GPU memory to achieve that (I had to use A100 80G or H100). After 3 minutes, I got a video with 2 second and the quality is bad. Btw, I am not familiar with Jax though I hear that Jax is quite efficient even on Nvidia GPU.

I think maybe we can wait for Meta to make their Chameleon open source, which is similar to LWM (text and image generation with LLM + VQGAN encoder/decoder, without video generation). Bytedance also made their VAR open-source. The smallest version of VAR only has 310M parameters. The best news: they are written in pytorch.

Perhaps you can first start with finetuning these pytorch model? If you like, we can set up a discord server first and check how many people are also interested in it!