🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
I configured the 'distributed_type' within the 'accelerate_config_fsdp' to utilize DEEPSPEED. However, during the course of training, there is a persistent escalation in GPU memory consumption, culminating in an eventual Out-Of-Memory (OOM) error. What factors contribute to this phenomenon?
I configured the 'distributed_type' within the 'accelerate_config_fsdp' to utilize DEEPSPEED. However, during the course of training, there is a persistent escalation in GPU memory consumption, culminating in an eventual Out-Of-Memory (OOM) error. What factors contribute to this phenomenon?