Luodian / Otter

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
https://otter-ntu.github.io/
MIT License
3.54k stars 242 forks source link

How to configure acceleration so that main thread memory usage does not exceed the range (make full use of other graphics cards) #273

Closed xmc-andy closed 11 months ago

xmc-andy commented 11 months ago

Sorry to bother you so late, I'm trying to unfreeze Vit via the OtterForConditionalGeneration setting in otter.modeling_otter for param in self.vision_encoder.parameters(): param.requires_grad = True And remove the with torch.no_grad(): that calculates vision_x. The result is that three V100 (32G) can be trained when vit is not unfrozen, but four V100 cannot be trained after unfreezing. The reason is that the card of the main thread exceeds the memory, and other cards It only uses about 20G. What do I need to do to train the model? Do any modifications need to be made to the acceleration configuration (originally I used fsdp, and I have 48 CPUs)

xmc-andy commented 11 months ago

Can you help me?