aksnzhy / xlearn

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
https://xlearn-doc.readthedocs.io/en/latest/index.html
Apache License 2.0
3.08k stars 518 forks source link

criteo ctr data 训练 结果异常 #158

Open xxllp opened 5 years ago

xxllp commented 5 years ago

[------------] Time cost for reading problem: 676.08 (sec) [ ACTION ] Initialize model ... [------------] Model size: 2.29 KB [------------] Time cost for model initial: 0.00 (sec) [ ACTION ] Start to train ... [ ACTION ] Cross-validation: 1/5: [------------] Epoch Train log_loss Test log_loss Test AUC Time cost (sec) [ 10% ] 1 -nan -nan 0.500000 57.52 [ 20% ] 2 -nan -nan 0.500000 57.53 [ 30% ] 3 -nan -nan 0.500000 57.39

为啥结果都是nan ,是因为里面特征有空值,需要填充???

aksnzhy commented 5 years ago

@xxllp 能看看你的数据大概长什么样么?你用一小部分训练看看会不会有问题?如果有问题你可以把数据发给我。

xxllp commented 5 years ago

就是官方的数据,原始数据很大的。我觉得是不是因为有空格

xxllp commented 5 years ago

试了下小的数据也不行。格式是csv 格式的

aksnzhy commented 5 years ago

@xxllp CSV 格式会有问题的,无法表示稀疏数据,需要转成 libsvm 或早 libffm 格式的数据。

xxllp commented 5 years ago

好的,知道了,不知道有无转换格式的代码例子。我自己找的感觉不太对

ucasiggcas commented 4 years ago

大佬找到转换的代码没啊??有没有啥API啊? 多谢