Closed VitorGuizilini closed 4 years ago
Hi! thanks for your contribution!, great first issue!
Hi @vguizilini could you be more specific how much more memory is required?
@jeremyjordan can we get that memory profiler? @vguizilini mind trying again from master?
Memory usage for my original implementation (horovod for distributed training)
Memory usage for my Pytorch-Lightning implementation (ddp)
I'm loading the same configuration and same networks in both. I'm still learning to use Pytorch-Lightning, what should I profile next?
@neggert or @williamFalcon any ideas why GPU memory isn't consistent across the nodes?
Following up on this issue, is there anything else I should provide to facilitate debugging?
Over the last week I have been porting my code on monocular depth estimation to Pytorch-Lightning, and everything is working perfectly. However, my models seem to require more GPU memory than before, to the point where I need to significantly decrease batch size at training time. These are the Trainer parameters I am using, and relevant versions:
Because of that (probably) I am having issues replicating my results, could you please advise on possible solutions? I will open-source the code as soon as I manage to replicate current results.