aksnzhy / xlearn

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
https://xlearn-doc.readthedocs.io/en/latest/index.html
Apache License 2.0
3.09k stars 518 forks source link

Negative AUC with large number of test samples and FFM #49

Open alexklibisz opened 6 years ago

alexklibisz commented 6 years ago

Hi, I'm trying to fit the FFM model with 5.9 million training samples and 1.5 million test samples.

If I assign more than about 100K samples for testing I see negative AUC values.

For example, with 100K testing samples it's normal:

[------------] Number of Feature: 359966
[------------] Number of Field: 14
[------------] Time cost for reading problem: 7.71 (sec)
[ ACTION     ] Initialize model ...
[------------] Model size: 464.13 MB
[------------] Time cost for model initial: 0.69 (sec)
[ ACTION     ] Start to train ...
[------------] Epoch      Train log_loss       Test log_loss            Test AUC     Time cost (sec)
[   20%      ]     1            0.634412            0.676192            0.604832               24.80
[   40%      ]     2            0.621513            0.672281            0.612237               24.77
[   60%      ]     3            0.607376            0.664812            0.622405               24.58
[   80%      ]     4            0.591081            0.660661            0.630569               23.11
[  100%      ]     5            0.579556            0.657870            0.635688               24.40
[ ACTION     ] Finish training and start to save model ...
[------------] Model file: model.out
[------------] Time cost for saving model: 1.56 (sec)
[ ACTION     ] Clear the xLearn environment ...
[------------] Total time cost: 132.76 (sec)

But with 200K samples, I see negative AUC values:

[------------] Number of Feature: 359966
[------------] Number of Field: 14
[------------] Time cost for reading problem: 10.12 (sec)
[ ACTION     ] Initialize model ...
[------------] Model size: 464.13 MB
[------------] Time cost for model initial: 0.71 (sec)
[ ACTION     ] Start to train ...
[------------] Epoch      Train log_loss       Test log_loss            Test AUC     Time cost (sec)
[   20%      ]     1            0.634381            0.670924           -3.245600               23.65
[   40%      ]     2            0.621243            0.665583           -3.163935               27.52
^C^C^C^C^C^C[   60%      ]     3            0.606576            0.658818           -3.054138               25.22
[   80%      ]     4            0.590273            0.655266           -2.969608               28.78
[  100%      ]     5            0.578989            0.653103           -2.918939               28.29
[ ACTION     ] Finish training and start to save model ...
[------------] Model file: model.out
[------------] Time cost for saving model: 2.12 (sec)
[ ACTION     ] Clear the xLearn environment ...
[------------] Total time cost: 147.99 (sec)

And if I add all 1.5 million test samples, the values are much more negative.

[------------] Number of Feature: 359966
[------------] Number of Field: 14
[------------] Time cost for reading problem: 21.44 (sec)
[ ACTION     ] Initialize model ...
[------------] Model size: 464.13 MB
[------------] Time cost for model initial: 0.75 (sec)
[ ACTION     ] Start to train ...
[------------] Epoch      Train log_loss       Test log_loss            Test AUC     Time cost (sec)
[    1%      ]     1            0.634459            0.669086         -489.920593               31.13
[    2%      ]     2            0.621340            0.663582         -477.539825               25.53
[    3%      ]     3            0.606978            0.656649         -462.629272               23.70
[    4%      ]     4            0.590852            0.652189         -451.603271               22.88
[    5%      ]     5            0.579603            0.650198         -444.887390               21.49

What's interesting is that the Test log_loss continues to decrease. Maybe it's a type overflow error in the metric calculation?

alexklibisz commented 6 years ago

I've since also tried with metric acc and it works fine.

aksnzhy commented 6 years ago

It seems that the AUC metric is not stable. @xswang Can you fix it? @alexklibisz I'm sorry for that. Maybe you can use the other metric, and we will fix it as soon as possible.

xswang commented 6 years ago

@alexklibisz https://github.com/alexklibisz thanks for you question, I will fix it

2017-12-17 14:02 GMT+08:00 Chao Ma notifications@github.com:

It seems that the AUC metric is not stable. @xswang https://github.com/xswang Can you fix it? @alexklibisz https://github.com/alexklibisz I'm sorry for that. Maybe you can use the other metric, and we will fix it as soon as possible.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/aksnzhy/xlearn/issues/49#issuecomment-352234476, or mute the thread https://github.com/notifications/unsubscribe-auth/ABwfFffQ6rpuCXzh3yMfwMY-AmZsdajQks5tBK6TgaJpZM4REa5T .

xswang commented 6 years ago

@alexklibisz can you send your 5.9 million test data to me please? my email: 2012wxs@gmail.com , thankyou.

alexklibisz commented 6 years ago

Ok, I emailed you with the Dropbox link. Training file has 5.9M rows. Validation file has 1.47M rows.

Thanks!

On Mon, Dec 18, 2017 at 8:21 AM xswang notifications@github.com wrote:

@alexklibisz https://github.com/alexklibisz can you send your 5.9 million test data to me please? my email: 2012wxs@gmail.com , thankyou.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/aksnzhy/xlearn/issues/49#issuecomment-352423529, or mute the thread https://github.com/notifications/unsubscribe-auth/AHpNfDet2kToRblcalv58LafXx9NTft1ks5tBmbFgaJpZM4REa5T .

xswang commented 6 years ago

I got it, Thank you.

2017-12-19 1:26 GMT+08:00 Alex Klibisz notifications@github.com:

Ok, I emailed you with the Dropbox link. Training file has 5.9M rows. Validation file has 1.47M rows.

Thanks!

On Mon, Dec 18, 2017 at 8:21 AM xswang notifications@github.com wrote:

@alexklibisz https://github.com/alexklibisz can you send your 5.9 million test data to me please? my email: 2012wxs@gmail.com , thankyou.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/aksnzhy/xlearn/issues/49#issuecomment-352423529, or mute the thread https://github.com/notifications/unsubscribe-auth/ AHpNfDet2kToRblcalv58LafXx9NTft1ks5tBmbFgaJpZM4REa5T .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/aksnzhy/xlearn/issues/49#issuecomment-352495907, or mute the thread https://github.com/notifications/unsubscribe-auth/ABwfFWPsJiXnlNA5tlG2DsZ0xB2qsg1Lks5tBqAqgaJpZM4REa5T .

alexklibisz commented 6 years ago

@xswang From your email it was an issue with types and was fixed by updating a variable to long long. Out of curiosity, how many samples can you compute with this type?

xswang commented 6 years ago

@Alex Klibisz, The MAX value long long data type is 9223372036854775807, so the max samples number is 9223372036854775807.

2017-12-19 10:25 GMT+08:00 Alex Klibisz notifications@github.com:

@xswang https://github.com/xswang From your email it was an issue with types and was fixed by updating a variable to long long. Out of curiosity, how many samples can you compute with this type?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/aksnzhy/xlearn/issues/49#issuecomment-352620435, or mute the thread https://github.com/notifications/unsubscribe-auth/ABwfFRIAsV8jqEJhs9FzMWHEqCR-DIrCks5tBx6qgaJpZM4REa5T .

aksnzhy commented 6 years ago

@alexklibisz Hi Alex, we have fixed this bug now.