NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
BSD 3-Clause "New" or "Revised" License
8.42k stars 1.4k forks source link

How to improve training performance with Apex package #1838

Open tjk9501 opened 2 months ago

tjk9501 commented 2 months ago

Hello!

I am using Apex package for speeding up training of my CNN model, I compare the performance between using traditional Adam algorithm and using Apex O1 optimization technique with the following code:

optimizer = optim.Adam(model.parameters(), lr = 1e-2)
model, optimizer = amp.initialize(model, optimizer, opt_level = 'O1')

The training process is speed up 3-4 times visibly compared with traditional Adam algorithm. But when I check the training model, I find it turns out performance of model trained with Apex in testing sets is degraded compared with just using Adam. Are there any solutions? Because I want to speed up training process and obtain good performances on testing datasets.

Thank you!