geffy / tffm

TensorFlow implementation of an arbitrary order Factorization Machine
MIT License
780 stars 176 forks source link

Question about data format #38

Closed BlaBlaPer closed 6 years ago

BlaBlaPer commented 6 years ago

Hi! I just want to ask about do we need to transform every feature column in the dataset to 0/1 representation? I know we need to transform the categorical variables, but what about the numerical variables (like price)? Do we also need to transform them?

Besides, when I tried to transform all variables to 0/1 representations, I got 550+ columns, and I also have 100,000 rows. When I train the model, I always got this error: NaN or Inf in w[2]. : Tensor had NaN values But I am pretty sure there are no other numbers other than 0/1. How does it happen? However, when I only use 90,000 rows of my dataset, this problem disappears.I really don't know why and I really need your help!!!

Thank a lot!!! Weisi

geffy commented 6 years ago

Hi @BlaBlaPer ! About features -- no, transformation of numeric features does not required. Of course you can do it and get some kind of feature engineering, but generally it doesn't needed.

About the error -- I can't help you without data and full stack trace. But I can say that such kind of errors appears if learning process diverges (e.g. learning rate too big). Another possibility: you have non-zero regularization and a row with all zero elements.

BlaBlaPer commented 6 years ago

Hi @geffy !! Thanks a lot!!! As I initially use the dense method to train my model, this problem occurs, but I then transform my dataset to sparse, this problem disappears! Really thanks for your advice!! Weisi