""" define loss scaler for automatic mixed precision """
# Creates a GradScaler once at the beginning of training.
scaler = torch.cuda.amp.GradScaler() ---# here
for batch_idx, (inputs, labels) in enumerate(data_loader):
optimizer.zero_grad()
with torch.cuda.amp.autocast():
# Casts operations to mixed precision
outputs = model(inputs)
loss = criterion(outputs, labels)
# Scales the loss, and calls backward() ---# here
# to create scaled gradients
scaler.scale(loss).backward()
# Unscales gradients and calls ---# here
# or skips optimizer.step()
scaler.step(self.optimizer)
# Updates the scale for next iteration ---# here
scaler.update()
아주 simple(here 부분 check)
backtime92-new-base-syn vs backtime92-new-base-syn-amp
amp의 효과 비교
amp를 사용함으로써 gpu(2080 ti) 메모리는 당연 적게 사용한 것을 볼 수 있고, training time은 1시간 정도 줄어든 것을 확인
목표 : mixed precision 적용하기
amp를 적용한 예제 code
참고 : gpu memory