TrainingAMP issues - Githubissues

Hey there,

Thank you for open-sourcing this project. In order to decrease the burden of batch size, I'm trying to use eisen.utils.workflows.TrainingAMP class instead of eisen.utils.workflows.Training as a choice of workflow. However, I measured that there is no significant change in training and inference time, and there is literally no change for the memory footprint.

I analyzed the training.py file where all of these classes are implemented and noted that the lines, that correspond to the torch.cuda.amp.autocast module, have been commented. After uncommenting the lines 9 and 251, this time I faced with a Runtime Error that advice me to use the combination of BCE and Sigmoid, instead of placing the Sigmoid function at the end and using BCE for the loss function. How should we approach to this problem when we would like to use more than 1 loss function term?

GPU: GV100 (32 GB) OS: Ubuntu 20.04 Model: U-Net3D Batch Size: 4 (~31 GB of memory footprint) PyTorch: 1.6.0 + CUDA Toolkit 10.2 (conda) Eisen: 0.1.10 Python: 3.7

Hello! You are totally right!

The AMP workflow using Apex will work, but the other one - using native pytorch - was not working with pytorch 1.5 so we commented the line until pytorch 1.6 would come out.

Now it has come out but we forgot to uncomment the line and test...

Would you like to give it a try and make a pull request?

Fausto Milletarì Sent from my iPhone

On 18. Aug 2020, at 00:49, Caner Ozer notifications@github.com wrote:

Hey there,

Thank you for open-sourcing this project. In order to decrease the burden of batch size, I'm trying to use eisen.utils.workflows.TrainingAMP class instead of eisen.utils.workflows.Training as a choice of workflow. However, I measured that there is no significant change in training and inference time, and there is literally no change for the memory footprint.

I analyzed the training.py file where all of these classes are implemented and noted that the lines, that correspond to the torch.cuda.amp.autocast module, have been commented. After uncommenting the lines 9 and 251, this time I faced with a Runtime Error that advice me to use the combination of BCE and Sigmoid, instead of placing the Sigmoid function at the end and using BCE for the loss function. How should we approach to this problem when we would like to use more than 1 loss function term?

GPU: GV100 (32 GB) OS: Ubuntu 20.04 Model: U-Net3D Batch Size: 4 (~31 GB of memory footprint) PyTorch: 1.6.0 + CUDA Toolkit 10.2 (conda) Eisen: 0.1.10 Python: 3.7

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

eisen-ai / eisen-core

TrainingAMP issues #34