Closed ftnext closed 3 years ago
trainの最大カウントを()で示す
NaNが多くても特徴量として増やしてみる(LightGBMなら影響されにくいから)ということか
$ python -i preprocess.py data/datasets/ data/preprocessed/v1.5
# train, testともに33MB
$ python training.py data/preprocessed/v1.5/ data/datasets/train.csv submissions/
train: (12026, 1066)
test: (12008, 1066)
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
Training until validation scores don't improve for 100 rounds
[500] valid_0's rmse: 1.07444
Early stopping, best iteration is:
[412] valid_0's rmse: 1.07278
Fold 0 RMSLE: 1.0728
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
Training until validation scores don't improve for 100 rounds
[500] valid_0's rmse: 1.07319
Early stopping, best iteration is:
[485] valid_0's rmse: 1.07228
Fold 1 RMSLE: 1.0723
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
Training until validation scores don't improve for 100 rounds
Early stopping, best iteration is:
[356] valid_0's rmse: 1.04132
Fold 2 RMSLE: 1.0413
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
Training until validation scores don't improve for 100 rounds
Early stopping, best iteration is:
[378] valid_0's rmse: 1.07961
Fold 3 RMSLE: 1.0796
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
Training until validation scores don't improve for 100 rounds
[500] valid_0's rmse: 1.0647
Early stopping, best iteration is:
[430] valid_0's rmse: 1.05937
Fold 4 RMSLE: 1.0594
--------------------------------------------------
FINISHED | Whole RMSLE: 1.0652
features count: 1066
title__lang=__label__en 118844.63220578432
StringLength__sub_title 65556.89275246859
size_h 55616.923470860114
size_w 39223.87466659397
more_title__lang=__label__en 31559.823882102966
title__lang=__label__nl 25121.263835430145
CE__principal_maker 12894.914893032517
dating_year_early 10398.156597647816
dating_year_late 8812.289369143546
description_tfidf_2 7871.548617511988
StringLength__more_title 7238.329542626743
acquisition_date=1994-01-01T00:00:00 7200.647782564163
StringLength__title 6007.2990922778845
description_tfidf_9 5623.561638459563
StringLength__description 5138.471152223647
description_tfidf_10 4998.281978905201
CE__acquisition_method 4884.856794159859
more_title__lang=__label__nl 4643.109343677759
description_tfidf_16 4450.151097655296
description_tfidf_0 4010.74718981795
StringLength__long_title 3995.5137103907764
description_tfidf_44 3874.518201753497
description_tfidf_47 3676.9679602347314
description_tfidf_5 3655.5469564199448
description_tfidf_1 3590.7993990182877
description_tfidf_4 3535.1186010837555
description_tfidf_27 3467.5155571103096
description_tfidf_22 3411.4941940903664
description_tfidf_31 3322.390802204609
acquisition_method=transfer 3311.1518894433975
description_tfidf_6 3203.190434006974
description_tfidf_7 2941.157238088548
description_tfidf_3 2887.5754666924477
description_tfidf_49 2862.502710789442
dating_period=19 2818.2294959425926
description_tfidf_13 2796.575230151415
description_tfidf_45 2761.0347990207374
description_tfidf_21 2533.8030881285667
description_tfidf_14 2500.7471777647734
description_tfidf_48 2496.626448661089
more_title__lang= 2495.551609516144
description_tfidf_28 2490.9852796792984
description_tfidf_42 2461.801316257566
description_tfidf_46 2451.1162937805057
description_tfidf_40 2385.0348535478115
description_tfidf_39 2353.476946234703
description_tfidf_18 2307.789521291852
CE__title 2296.7285529058427
description_tfidf_35 2289.0134964175522
description_tfidf_29 2273.5260899960995
more_titleが落ちている(→落とさず残す考え)
$ python preprocess.py data/datasets/ data/preprocessed/v1.5.1
train: (12026, 1082)
test: (12008, 1082)
$ python training.py data/preprocessed/v1.5.1/ data/datasets/train.csv submissions/
train: (12026, 1082)
test: (12008, 1082)
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
Training until validation scores don't improve for 100 rounds
[500] valid_0's rmse: 1.0618
Early stopping, best iteration is:
[425] valid_0's rmse: 1.05996
Fold 0 RMSLE: 1.0600
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
Training until validation scores don't improve for 100 rounds
Early stopping, best iteration is:
[279] valid_0's rmse: 1.0677
Fold 1 RMSLE: 1.0677
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
Training until validation scores don't improve for 100 rounds
[500] valid_0's rmse: 1.03321
Early stopping, best iteration is:
[435] valid_0's rmse: 1.03184
Fold 2 RMSLE: 1.0318
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
Training until validation scores don't improve for 100 rounds
Early stopping, best iteration is:
[299] valid_0's rmse: 1.08273
Fold 3 RMSLE: 1.0827
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
Training until validation scores don't improve for 100 rounds
[500] valid_0's rmse: 1.04863
Early stopping, best iteration is:
[411] valid_0's rmse: 1.0474
Fold 4 RMSLE: 1.0474
--------------------------------------------------
FINISHED | Whole RMSLE: 1.0581
features count: 1082
title__lang=__label__en 168383.14408916235
size_h 59977.0106382519
StringLength__sub_title 53420.976950764656
size_w 27903.394222528674
CE__acquisition_date 26692.429003566504
more_title__lang=__label__en 11337.206842660904
dating_year_late 8089.388304844499
StringLength__more_title 7929.445962419733
CE__principal_maker 7928.032715566456
dating_year_early 7566.1707446575165
description_tfidf_10 5573.114014357328
CE__dating_period 5323.992191135883
CE__acquisition_credit_line 5248.245001330972
title__lang=__label__nl 4954.860521554947
CE__description 4911.8659618496895
description_tfidf_46 4445.551183119416
description_tfidf_9 4378.325389226899
CE__acquisition_method 4309.816353216767
description_tfidf_16 4306.426808230579
CE__principal_or_first_maker 4194.883913826197
StringLength__title 4188.884796886705
more_title__lang=__label__nl 3963.5859320759773
StringLength__description 3750.246826261282
StringLength__long_title 3702.8299085581675
description_tfidf_22 3499.176881402731
description_tfidf_0 3309.843187302351
CE__sub_title 3288.9826562441885
description_tfidf_28 3207.533474355936
CE__dating_sorting_date 3158.271821387112
description_tfidf_2 3097.139425635338
description_tfidf_5 2938.435487974435
CE__dating_year_late 2934.583312444389
acquisition_date=1994-01-01T00:00:00 2784.607858657837
long_title__lang=__label__en 2761.530075713992
description_tfidf_1 2679.9649018645287
description_tfidf_4 2570.2622108235955
CE__dating_presenting_date 2485.637671297416
description_tfidf_21 2467.6984004974365
description_tfidf_47 2467.3162631988525
description_tfidf_31 2459.160049557686
description_tfidf_6 2445.1297653466463
description_tfidf_49 2411.843475818634
description_tfidf_13 2392.2614911198616
description_tfidf_48 2352.342723816633
description_tfidf_3 2347.935442060232
description_tfidf_29 2314.170175552368
description_tfidf_14 2214.7736707031727
description_tfidf_18 2208.7245542109013
StringLength__principal_maker 2205.4734529554844
description_tfidf_41 2180.937787041068
ぱっと見きかなそうな特徴量は除いて洗い出し、必要なものは実装する