aksnzhy / xlearn

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
https://xlearn-doc.readthedocs.io/en/latest/index.html
Apache License 2.0
3.09k stars 519 forks source link

关于样本归一化 #258

Closed scriptboy1990 closed 5 years ago

scriptboy1990 commented 5 years ago

我最近在根据模型txt文件来实现ffm的预测过程,想问下,默认的训练过程会对样本进行归一化对吧,比如libffm格式为: 1 0:12:0.5 1:34:1 2:56:1 3:78:1 归一化之后,稀疏矩阵里面的值就是 0.5 / sqrt(0.5^2 + 1^2 + 1^2)和1 / sqrt(0.5^2 + 1^2 + 1^2)吗? 然后预测的时候,一阶的用weight直接乘上述值,二阶交叉的用vi * vj后再成这两个特征的上述值,最后得出logit值。

etveritas commented 5 years ago

FFM模型预测值计算,线性部分,如你上面所述;非线性部分,是两个原始的特征值相乘之后乘上没有开方的norm,在你的例子中即0.5*1*(1/(0.5^2+1^2+1^2))

scriptboy1990 commented 5 years ago

very good 现在用txt文件和向量化的方法预测出来的结果和官方结果一致了。 没办法哈,要把ffm用于召回。