ftnext / atmaCup10-paintings-likes

https://www.guruguru.science/competitions/16/
MIT License
0 stars 0 forks source link

講座の内容を元にLightGBMモデル訓練(&振り返れる)スクリプトを実装する [CV 1.1836] [LB 1.2537] #3

Closed ftnext closed 3 years ago

ftnext commented 3 years ago

https://www.guruguru.science/competitions/16/discussions/185c7dc6-5e3a-49c6-9c30-41bf007cc694/

ftnext commented 3 years ago

params=None

$ python training.py data/preprocessed/v1/ data/datasets/train.csv submissions/

Training until validation scores don't improve for 100 rounds
Did not meet early stopping. Best iteration is:
[100]   valid_0's l2: 1.43506
Fold 0 RMSLE: 1.1979
Training until validation scores don't improve for 100 rounds
Did not meet early stopping. Best iteration is:
[97]    valid_0's l2: 1.37973
Fold 1 RMSLE: 1.1746
Training until validation scores don't improve for 100 rounds
Did not meet early stopping. Best iteration is:
[97]    valid_0's l2: 1.3866
Fold 2 RMSLE: 1.1775
Training until validation scores don't improve for 100 rounds
Did not meet early stopping. Best iteration is:
[85]    valid_0's l2: 1.44348
Fold 3 RMSLE: 1.2015
Training until validation scores don't improve for 100 rounds
Did not meet early stopping. Best iteration is:
[100]   valid_0's l2: 1.41409
Fold 4 RMSLE: 1.1892
--------------------------------------------------
FINISHED | Whole RMSLE: 1.1882

20210308-005643_data_preprocessed_v1

dating_year_early   1915.0
StringLength__description   1861.0
dating_year_late    1806.0
StringLength__more_title    1803.0
CE__principal_maker 1354.0
StringLength__long_title    1194.0
StringLength__title 1061.0
StringLength__sub_title 874.0
CE__acquisition_method  517.0
CE__title   334.0
acquisition_method=transfer 188.0
acquisition_method=bequest  100.0
dating_period   89.0
acquisition_method=unknown  88.0
principal_maker=Samuel Bourne   85.0
principal_maker=Philip Henry Delamotte  68.0
acquisition_method=loan 67.0
acquisition_method=gift 60.0
principal_maker=Woodbury & Page 60.0
principal_maker=Raphaël Sadeler (I) 59.0
principal_maker=Théodore van Lelyveld   58.0
principal_maker=Charles Rochussen   55.0
principal_maker=Jan Luyken  52.0
principal_maker=Hendrick Goltzius   50.0
principal_maker=Jan van de Velde (II)   42.0
principal_maker=Jacob van der Schley    41.0
principal_maker=Atelier Kurkdjian   41.0
principal_maker=Jacques Lalaing 39.0
principal_maker=Jan Harmensz. Muller    37.0
principal_maker=Bernard Picart  31.0
principal_maker=Paulus Pontius  30.0
principal_maker=Cornelis Galle (I)  26.0
principal_maker=Henry W. Taunt  25.0
principal_maker=Romeyn de Hooghe    24.0
principal_maker=Jacob Olie jr.  19.0
principal_maker=Eberhard Cornelis Rahms 19.0
principal_maker=Giorgio Sommer  18.0
principal_maker=Johann Sadeler (I)  17.0
principal_maker=anoniem (Monumentenzorg)    17.0
principal_maker=Daniël Veelwaard (I)    17.0
principal_maker=Bonfils 17.0
principal_maker=Hermanus Jan Hendrik van Rijkelijkhuysen    14.0
principal_maker=Adolphe Burdet  14.0
principal_maker=Jan Frederik Christiaan Reckleben   13.0
principal_maker=Aegidius Sadeler    11.0
principal_maker=Juan Laurent    11.0
principal_maker=Theodoor Koning 10.0
principal_maker=Roelant Roghman 6.0
principal_maker=Richard Tepe    5.0
principal_maker=Fratelli Alinari    5.0
ftnext commented 3 years ago

params=lgbm_params [CV 1.1836] [LB 1.2537]

training.pyを変更して実行 $ python training.py data/preprocessed/v1/ data/datasets/train.csv submissions/

[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
Training until validation scores don't improve for 100 rounds
[500]   valid_0's rmse: 1.19982
Early stopping, best iteration is:
[568]   valid_0's rmse: 1.1983
Fold 0 RMSLE: 1.1983
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
Training until validation scores don't improve for 100 rounds
Early stopping, best iteration is:
[324]   valid_0's rmse: 1.17177
Fold 1 RMSLE: 1.1718
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
Training until validation scores don't improve for 100 rounds
[500]   valid_0's rmse: 1.19045
Early stopping, best iteration is:
[461]   valid_0's rmse: 1.18878
Fold 2 RMSLE: 1.1888
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
Training until validation scores don't improve for 100 rounds
Early stopping, best iteration is:
[378]   valid_0's rmse: 1.19028
Fold 3 RMSLE: 1.1903
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
Training until validation scores don't improve for 100 rounds
[500]   valid_0's rmse: 1.16945
Early stopping, best iteration is:
[492]   valid_0's rmse: 1.16845
Fold 4 RMSLE: 1.1684
--------------------------------------------------
FINISHED | Whole RMSLE: 1.1836

20210308-005853_data_preprocessed_v1

StringLength__sub_title 115919.8970608539
CE__principal_maker 78480.97932646889
dating_year_late    57125.873609007336
StringLength__more_title    48863.97797768813
CE__acquisition_method  46840.115645824
dating_year_early   41359.81569700455
StringLength__description   41205.97698495025
acquisition_method=transfer 28292.064049318433
StringLength__long_title    19234.030936465228
StringLength__title 18824.523672675597
dating_period   17687.047488784418
CE__title   8682.455806170823
acquisition_method=bequest  7628.177280128002
acquisition_method=purchase 2390.332360152155
acquisition_method=loan 2006.157952874899
acquisition_method=unknown  1756.5118776634336
acquisition_method=gift 1599.86732339859
principal_maker=Samuel Bourne   1444.4636361300945
principal_maker=Jacob Olie jr.  1073.2613588571548
principal_maker=anonymous   935.7564471065998
principal_maker=Philip Henry Delamotte  816.7143234610558
principal_maker=Raphaël Sadeler (I) 723.7326199412346
principal_maker=Bernard Picart  682.2669063210487
principal_maker=Woodbury & Page 666.6505894064903
principal_maker=Jacques Lalaing 652.1705242991447
principal_maker=Atelier Kurkdjian   573.5614114403725
principal_maker=Théodore van Lelyveld   553.4593561291695
principal_maker=Hendrick Goltzius   504.37551349401474
principal_maker=Richard Tepe    377.38423067331314
principal_maker=Jan Luyken  363.742679476738
principal_maker=Adolphe Burdet  360.61634880304337
principal_maker=Jacob van der Schley    346.937460064888
principal_maker=Jan Harmensz. Muller    337.7850676178932
principal_maker=Charles Rochussen   288.1016381382942
principal_maker=Hermanus Jan Hendrik van Rijkelijkhuysen    268.6380909085274
principal_maker=Romeyn de Hooghe    222.4745261669159
principal_maker=Jan van de Velde (II)   221.83499509096146
principal_maker=Giorgio Sommer  221.3071709871292
principal_maker=Johann Sadeler (I)  205.87370175123215
principal_maker=Cornelis Galle (I)  197.35874596238136
principal_maker=Jacob de Gheyn (II) 196.14085227251053
principal_maker=Aegidius Sadeler    193.35006448626518
principal_maker=Paulus Pontius  192.17733943462372
principal_maker=Waldemar Titzenthaler   190.73536175489426
principal_maker=anoniem (Monumentenzorg)    163.71789973974228
principal_maker=Daniël Veelwaard (I)    144.55989742279053
principal_maker=Bonfils 133.13036489486694
principal_maker=Johann Heinrich Maria Hubert Rennefeld  112.56654340028763
principal_maker=Jan Frederik Christiaan Reckleben   110.72550246119499
principal_maker=Roelant Roghman 105.06099009513855
ftnext commented 3 years ago

カテゴリ変数の閾値を20と講義と揃えて(v1.1)、lgbm_params指定

CVは改善。特徴量が多いほうがいいらしい(モデルに判断させるべきらしい)

facetを見て以下のように設定していた(v1データ)

$ python training.py data/preprocessed/v1.1/ data/datasets/train.csv submissions/
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
Training until validation scores don't improve for 100 rounds
[500]   valid_0's rmse: 1.19089
Early stopping, best iteration is:
[521]   valid_0's rmse: 1.18976
Fold 0 RMSLE: 1.1898
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
Training until validation scores don't improve for 100 rounds
[500]   valid_0's rmse: 1.17152
Early stopping, best iteration is:
[409]   valid_0's rmse: 1.16842
Fold 1 RMSLE: 1.1684
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
Training until validation scores don't improve for 100 rounds
[500]   valid_0's rmse: 1.17761
Early stopping, best iteration is:
[428]   valid_0's rmse: 1.17497
Fold 2 RMSLE: 1.1750
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
Training until validation scores don't improve for 100 rounds
Early stopping, best iteration is:
[379]   valid_0's rmse: 1.19238
Fold 3 RMSLE: 1.1924
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
Training until validation scores don't improve for 100 rounds
Early stopping, best iteration is:
[384]   valid_0's rmse: 1.16635
Fold 4 RMSLE: 1.1664
--------------------------------------------------
FINISHED | Whole RMSLE: 1.1784

20210308-092436_data_preprocessed_v1 1

StringLength__sub_title 209930.162448043
dating_year_late    42569.24030684505
CE__principal_maker 41079.74357781559
dating_year_early   40140.34276960185
StringLength__description   39910.90862703486
StringLength__more_title    34398.36773852137
StringLength__title 18186.9582782457
acquisition_method=transfer 16716.9609423019
StringLength__long_title    15554.007925360755
CE__acquisition_method  14547.523025380448
dating_period   8276.197126097977
CE__title   8072.621644140054
principal_maker=Rembrandt van Rijn  6190.913442015648
acquisition_method=bequest  5191.203318551183
acquisition_method=purchase 4008.6148973125964
principal_maker=Herman Salzwedel    3705.6628245711327
acquisition_method=loan 2866.6571811437607
principal_maker=Jean Baptiste Vanmour   1783.4732909202576
principal_maker=George Hendrik Breitner 1737.422084748745
acquisition_method= 1643.8557840585709
acquisition_method=unknown  1628.8673606552184
acquisition_method=gift 1510.0922226384282
principal_maker=Jacob Olie jr.  1312.0474618077278
principal_maker=Samuel Bourne   1299.6625580787659
acquisition_method=nationalization 1795 1037.8393313884735
principal_maker=Raphaël Sadeler (I) 731.2425500154495
principal_maker=Philip Henry Delamotte  707.3953838348389
principal_maker=Atelier Kurkdjian   668.155992269516
principal_maker=anonymous   643.7922441363335
principal_maker=Woodbury & Page 619.2181321382523
principal_maker=Jacques Lalaing 595.9418876171112
principal_maker=Adolphe Burdet  579.9195005297661
principal_maker=Théodore van Lelyveld   578.0495392084122
principal_maker=Hermanus Jan Hendrik van Rijkelijkhuysen    517.6817879080772
principal_maker=Johann Sadeler (I)  444.7787608206272
principal_maker=Hendrick Goltzius   438.8445661664009
principal_maker=Jan Luyken  409.46383661031723
principal_maker=Bernard Picart  395.66525959968567
principal_maker=anoniem (Monumentenzorg)    376.42573696374893
principal_maker=James Higson    375.6434725522995
principal_maker=Cornelis Galle (I)  338.6184870004654
principal_maker=O. Hisgen & Co. 331.26985543966293
principal_maker=Ohannes Kurkdjian   319.7526163458824
principal_maker=Paulus Pontius  311.48908030986786
principal_maker=Charles Rochussen   306.03656232357025
principal_maker=Jacob van der Schley    305.7295599579811
principal_maker=Waldemar Titzenthaler   293.17414116859436
principal_maker=Jan Harmensz. Muller    292.3039834499359
principal_maker=Jacob de Gheyn (II) 227.67760467529297
principal_maker=Schelte Adamsz. Bolswert    204.3831462264061