boostcamp-5th-NLP05 / level1_semantictextsimilarity-nlp-05

0 stars 0 forks source link

sweep 적용 (2023/04/19) #21

Open yunjinchoidev opened 1 year ago

yunjinchoidev commented 1 year ago

[yunjinchoidev]

232969785-a39a59d7-077a-4e0b-8d26-d5194259df72 232970200-9bbecfb8-833d-4a8e-964e-0bcb9168e2df

Sweep Configuration 작성

아래 코드 해석 :

  1. train_loss 를 최소화.
  2. learning_rate 는 max: 0.00005336176037620592 min: 0.00001334044009405148 범위로 uniform 분포로 탐색 (?)
  3. model_name 은 아래 네개 확인
    • klue/roberta-small
    • klue/roberta-large
    • ys7yoo/sentence-roberta-large-kor-sts
    • jhgan/ko-sbert-sts
  4. batch_size 는 [8, 16, 32, 64] 확인
program: train.py
method: bayes
metric:
  goal: minimize
  name: train_loss
parameters:
  learning_rate:
    max: 0.1
    min: 0.0001
    distribution: uniform
  model_name:
    values:
      - klue/roberta-small
      - klue/roberta-large
      - ys7yoo/sentence-roberta-large-kor-sts
      - jhgan/ko-sbert-sts
      - sentence-transformers/xlm-r-large-en-ko-nli-ststb
    distribution: categorical
  batch_size:
    values: [8, 16, 32, 64]
    distribution: categorical

명령어 치면 자동으로 sweep 이 돌아간다.

232970343-171deaaf-708f-48aa-9b41-f57266f43b7b

데이터셋 100개로 일단 해봤습니다. 아래처럼 나오네요.

232992097-429470dd-2865-459a-ad53-388379d8cf8a

yunjinchoidev commented 1 year ago

[yunjinchoidev]

지은님

program: train.py
method: bayes
metric:
  goal: minimize
  name: train_loss
parameters:
  learning_rate:
    max: 0.1
    min: 0.0001
    distribution: uniform
  model_name:
    values:
      - jhgan/ko-sbert-sts
      - sentence-transformers/xlm-r-large-en-ko-nli-ststb
    distribution: categorical
  batch_size:
    values: [8, 16, 32, 64]
    distribution: categorical

윤진님

program: train.py
method: bayes
metric:
  goal: minimize
  name: train_loss
parameters:
  learning_rate:
    max: 0.1
    min: 0.0001
    distribution: uniform
  model_name:
    values:
      - klue/roberta-small
      - klue/roberta-large
      - ys7yoo/sentence-roberta-large-kor-sts
    distribution: categorical
  batch_size:
    values: [8, 16, 32, 64]
    distribution: categorical
yunjinchoidev commented 1 year ago

[yunjinchoidev]

최종

지은님

program: train.py
method: bayes
metric:
  goal: maximize
  name: val_pearson
parameters:
  learning_rate:
    max: 0.1
    min: 0.00001
    distribution: uniform
  model_name:
    values:
      - jhgan/ko-sbert-sts
      - sentence-transformers/xlm-r-large-en-ko-nli-ststb
    distribution: categorical
  batch_size:
    values: [8, 16, 32, 64]
    distribution: categorical

윤진님

program: train.py
method: bayes
metric:
  goal: maximize
  name: val_pearson
parameters:
  learning_rate:
    max: 0.1
    min: 0.00001
    distribution: uniform
  model_name:
    values:
      - klue/roberta-small
      - klue/roberta-large
      - ys7yoo/sentence-roberta-large-kor-sts
    distribution: categorical
  batch_size:
    values: [8, 16, 32, 64]
    distribution: categorical
yunjinchoidev commented 1 year ago

[lectura7942] val_pearson이 NAN으로 가는 런타임이 있어서 early stopping을 적용해야 할 것 같습니다. 저는 일단 val_pearson이 더 이상 커지지 않는 epoch이 3번 있으면 멈추도록 했습니다.

from pytorch_lightning.callbacks import EarlyStopping

early_stop_callback = EarlyStopping(monitor="val_pearson", min_delta=0.00, patience=3, verbose=False, mode="max")
# min_delta: 변화 <= min_delta 이면 변화 없는 것으로 간주
# patience: 해당 횟수만큼 변화 없으면 멈춤
# mode: max - 커지는 변화가 있는 metric을 보는 중

trainer = pl.Trainer(accelerator='gpu', max_epochs=args.max_epoch, log_every_n_steps=1, logger=wandb_logger, callbacks=[early_stop_callback])
yunjinchoidev commented 1 year ago

[yunjinchoidev]

learning rate 범위가 너무 넓어서 NAN 이 너무 많이 나오고 sweep 경우가 너무 많아서 min max 범위를 다음과 같이 수정함.

  learning_rate:
    max: 0.0001
    min: 0.000001
    distribution: uniform