Automatic Mixed Precision(AMP) 테스트

0. 유용한 사전지식

🔔 The components on GPU memory

model weights
forward activations saved for gradient computation
gradients
optimizer state

🔔 If we look at what’s happening with FP16 training (mixed precision)

forward activations saved for gradient computation are in half-precision
gradients are computed in half-precision but converted to full-precision for the update
optimizer states are in full precision as all the updates are done in full-precision

🔔내 생각

weights, optimizer state는 full-precision으로 저장
forward values, gradients는 half_precision으로 저장
gradients는 파라미터를 업데이트 할 때만 잠깐 full-precision으로 변환)

1. 실험 목적

AMP를 적용하면 메모리에 저장해야 하는 값들 중 선택적으로 32비트의 floating point가 아니라 16비트로 저장할 수 있습니다. 그리고 그 결과로 더 큰 batch size를 적용할 수 있기에 빠른 학습 속도와, 성능 개선을 기대할 수 있습니다.

2. 실험 세팅

실험에 사용한 모델: roberta-small

변화를 준 인자:

fp16 적용 유무

fp16_opt_level: O0 ~ O3

python train.py --output_dir ./models/train_dataset/roberta-small_fp_test/O3 --fp16 True --fp16_opt_level O3 --do_train --do_eval  --per_device_train_batch_size 32 --per_device_eval_batch_size 32 --evaluation_strategy epoch --overwrite_output_dir True

🔔 fp16_backend는 apex와 amp 두 가지가 있는데, APEX(A Pytorch Extention)는 최근 파이토치 버전에서는 더이상 지원하지 않는 것으로 보이며, 굳이 사용을 위해서는 따로 설치를 비롯한 귀찮은 작업들이 있는 것으로 보임, 그러나 huggingface에서 amp를 사용할 것을 권장하므로 APEX에 대한 별도의 실험을 진행하지 않았음

3. 실험 결과

fp16_test

확실한 것: fp16을 적용하였을 때 GPU메모리 사용량이 줄었다
의아한 것:
- fp16을 적용한 것 만으로 (배치 사이즈 증가 없이) 왜 성능이 올랐는지 모르겠다
- fp16의 optimization level의 변화에 따른 차이가 왜 없는지 모르겠다

boostcampaitech2 / mrc-level2-nlp-04

Automatic Mixed Precision(AMP) 테스트 #11

0. 유용한 사전지식

1. 실험 목적

2. 실험 세팅

3. 실험 결과

4. 참고 자료