boostcampaitech7 / level2-competitiveds-recsys-02

level2-competitiveds-recsys-02 created by GitHub Classroom
5 stars 0 forks source link

[064] 최종 모델 선정 #64

Closed SooMiiii closed 1 week ago

SooMiiii commented 1 week ago

Background

Todo

SooMiiii commented 1 week ago

결측치 처리 / 로그 변환/ 스케일링

  lgbm xgb catboost lr
기본(결측치 처리 X) 3988.38 3729.72 3840.59  
결측치 -999로 처리 3954.13 3713.29 3840.23 4403.96
로그 변환 4029.4 3759.43 3797.49 4404.05
scaling 4029.51 3721.43 3793.03 4534.36
로그 + 스케일링 4029.51 3721.43 3793.03 4534.36
모델별 피처중요도 확인 후 일부 삭제 3932.31 3722.79 3763.42  

결과 해석

삭제 변수들

doffice0827 commented 1 week ago

앙상블

앙상블 모델 목록  설정 MAE R2 리더보드 data split
cat, lgbm, xgb, FT random=42 3922.0335 X  3738.0696 real_final_df random(test_size: 0.2)
seoo2001 commented 1 week ago

트리 모델 파라미터 튜닝 범위 더 늘려야 함. 과소적합된 경우가 많아 보임.

SooMiiii commented 1 week ago

optuna

파라미터 튜닝중...

SooMiiii commented 1 week ago
seoo2001 commented 1 week ago

손 튜닝 파라미터

df: final_df, target: deposit

drop= ['deposit_by_area', 'subways_within_1km', 'park_count_500m', 'subways_within_500m', 'Is_Outside' ,'park_distance_kurtosis', 'park_distance_skewness']

train_data, valid_data = train_test_split(train_data, test_size=0.2, random_state=42)

# lgbm - 3824.114230545642
reg_params:
  colsample_bytree: 0.95
  metric: mae
  learning_rate: 0.06
  n_estimators: 300
  num_leaves: 1500
  max_depth: -1
  min_child_samples: 1
  min_child_weight: 1e-5
  random_state: 42
# cat - 3725.2702002218753
reg_params:
  task_type: GPU
  posterior_sampling: False
  boosting_type: 'Plain'
  depth: 15
  iterations: 4000
  l2_leaf_reg: 2
  learning_rate: 0.02
  loss_function: MAE
  model_size_reg: 0.2
  od_type: Iter
  od_wait: 10
  random_seed: 0
  bagging_temperature: 0.1
  thread_count: -1
  verbose: true
# xgb - 3766.771995849316
reg_params:
  device: "cuda"
  tree_method: hist
  booster: "gbtree"
  objective: "reg:squarederror"
  learning_rate: 0.03
  max_depth: 10
  min_child_weight: 1
  gamma: 0
  subsample: 1
  colsample_bytree: 0.9151100079351457
  colsample_bylevel: 1
  colsample_bynode: 1
  reg_alpha: 0.1
  reg_lambda: 0.1
  scale_pos_weight: 1
  base_score: 0.5
  random_state: 0
  n_estimators: 2000
  verbosity: 2

Tree 모델 3개 mean ensemble: 3712.5171330693547

FT-transformer 추가 mean ensemble: 3684.2490004511615 stacking ensemble: 3674.0620304845593

SooMiiii commented 1 week ago

예측 결과 시각화

ft가 값을 높게 예측 catboost가 값을 낮게 예측 lgbm, xgb는 비슷한 결과를 냄

날짜별 평균에 따른 가중치 주기

날짜별 변동성에 따른 가중치 주기

SooMiiii commented 1 week ago

파라미터 튜닝 이후 가중치 앙상블