ByungKwanLee / MoAI

[ECCV 2024] Official PyTorch implementation code for realizing the technical part of Mixture of All Intelligence (MoAI) to improve performance of numerous zero-shot vision language tasks.
MIT License
311 stars 32 forks source link

question about mix-precision in training #19

Open cassiaaaaaa opened 5 months ago

cassiaaaaaa commented 5 months ago

Dear author, I noticed that in your "accel/ddp_accel.yaml" , the "mix-precision" is "bf16". Did you used bf16 in training?

ByungKwanLee commented 5 months ago

yes! but comparing the use of it or not, there are no critical differences!

cassiaaaaaa commented 5 months ago

yes! but comparing the use of it or not, there are no critical differences!

Thanks a lot!

cassiaaaaaa commented 5 months ago

Hello, Lee. Sorry to ask again. Your code used three different data types in total (Bf16, fp16, fp32) , is that right?

The cv models are using fp32, the moai weights are bf16, and some image features are turned to fp16. As I noticed these codes:

    #in moai/arch_moai.py
    verb_embeds = self.get_input_embeddings()(processor(batched_verb_prompt, padding='longest', return_tensors="pt").input_ids)
    with torch.inference_mode(): self.vit.vision_tower.eval(); map_embeds = self.vision_proj(self.vit(self.image_processor(torch.stack(batched_panoptic_map).**to(torch.float16)).**to(device)))
    aux_embeds = torch.cat([verb_embeds, map_embeds], dim=1)

    #in moai/arch/modeling_internlm2.py
    soft_img_weight = F.softmax(self.moai_GA_img(h_im), dim=1, dtype=**torch.bfloat16**)
    soft_lang_weight = F.softmax(self.moai_GA_lang(h_lang), dim=1, dtype=**torch.bfloat16**)

The bf16 seems to be set by the accelerator config file. But if I do this, some parameters though I change it to fp32, it will strangely turn to bf16, causing errors. How did you wrote code to do autocast for specific parts of the model? I searched on the Internet for this problem, but my code still fails...