aksnzhy / xlearn

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
https://xlearn-doc.readthedocs.io/en/latest/index.html
Apache License 2.0
3.09k stars 519 forks source link

A few questions about fm model #272

Open wszhtc opened 5 years ago

wszhtc commented 5 years ago

I am training a fm model on a large dataset, but the result (AUC) is not good. Could you please help me figure out the following problem?

  1. Do I need to discretize numerical (continuous) features in the dataset?
  2. I think one-hot for categorical features is necessary, right?
  3. Do I need to do normalization for features?
  4. Is statistical features useful in fm model? Eg: how many times an item is clicked by users.

Thanks!

etveritas commented 5 years ago

These are all helpful, but i think you should consider it in the context of your business, construct more features or drop some features, or tune the hyper parameters, or maybe, if need, you should consider some other business metrics, e.g. comparing with the fact.