How to configure acceleration so that main thread memory usage does not exceed the range (make full use of other graphics cards)

Luodian / Otter

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

MIT License

3.54k stars 242 forks source link

Sorry to bother you so late, I'm trying to unfreeze Vit via the OtterForConditionalGeneration setting in otter.modeling_otter for param in self.vision_encoder.parameters(): param.requires_grad = True And remove the with torch.no_grad(): that calculates vision_x. The result is that three V100 (32G) can be trained when vit is not unfrozen, but four V100 cannot be trained after unfreezing. The reason is that the card of the main thread exceeds the memory, and other cards It only uses about 20G. What do I need to do to train the model? Do any modifications need to be made to the acceleration configuration (originally I used fsdp, and I have 48 CPUs)

Luodian / Otter

How to configure acceleration so that main thread memory usage does not exceed the range (make full use of other graphics cards) #273