apple / turicreate

Turi Create simplifies the development of custom machine learning models.
BSD 3-Clause "New" or "Revised" License
11.2k stars 1.14k forks source link

impossible to use side item_data in recommender if one value is missing #3411

Open azzelena opened 3 years ago

azzelena commented 3 years ago

When I add user_data (item_data) in turicreate.recommender.ranking_factorization_recommender.create (side information for the user/item) and some of these values==NaN, I get error.

Often some of users or items have the only one value in a side data. For example:

person_id product_id feature1 feature2
1000015 301939 1 394
100017 319586 NaN 394
1000183 17146 1 NaN
1000183 236082 1 349
1000189 316462 NaN 394
1000212 309060 NaN 394
1000212 369545 1 361
1000212 369560 1 NaN
1000212 6673 1 394
1000399 301992 NaN 846
1000418 214536 0 394
1000418 285443 1 394
1000418 331645 1 1262
1000418 258486 0 1262
1000418 321245 NaN 1262
1000418 331667 1 NaN
1000418 258496 0 NaN
1000418 303494 1 NaN
1000418 331632 NaN 1262
1000418 258488 0 NaN

Is there any way to use only one feature if other == NaN?

Thanks.

TobyRoseman commented 3 years ago

I'm not really understanding the table you've shared. How does your side information contain both a person id and a product id? I would expect two separate SFrames: one with the side information for persons and one with the side information for products.

Anyways, there's no way I know to do what you want. NaN values are not allowed in side information. However you don't need to provide side information for every item/user in the observation data. You can just call dropna() on your side information and use that.

Of course, then you're not using the information when some values in a row are not NaN. One possible solution is to impute these missing values, i.e. set the NaN values to some constant such as the mean or mode.