Alpha-VLLM / Lumina-mGPT

Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"
https://arxiv.org/abs/2408.02657
507 stars 22 forks source link

Trainable Parameters and their precisions #17

Closed SxJyJay closed 3 months ago

SxJyJay commented 3 months ago

Thanks for your great work!

During training, I found that all parameters are trainable and set to fp32 precision since the ChameleonXLLMXForConditionalGeneration class doesn't have a get_trainable_params method. I wonder whether all parameters require training during the 3-stage FP-SFT and whether fp32 precision for all parameters is necessary?

The relevant code can be found at https://github.com/Alpha-VLLM/Lumina-mGPT/blob/104abe453ec1acca5863698629c4db2111b0b3fc/xllmx/solvers/finetune/finetune.py#L286-L294

ChrisLiu6 commented 3 months ago
  1. For a parameter, as long as it is trainable (requires_grad=True), its fp32 version is necessary because the parameter updates have to be conducted in full precision. If the parameter is frozen, then we can simply keep its 16-bit version
  2. In our experiments we keep all parameters trainable along the whole SFT process, but you may try different settings by adding a "get_trainable_params" method.