NVIDIA-Merlin / Transformers4Rec

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation and works with PyTorch.
https://nvidia-merlin.github.io/Transformers4Rec/main
Apache License 2.0
1.07k stars 142 forks source link

How to use Transformers4Rec with pandas #748

Closed ralgond closed 10 months ago

ralgond commented 10 months ago

How to use Transformers4Rec with pandas

NVTabular is hard to learn, I want to skip it, and load/transform data for Transformers4Rec. Could you please show me how to achive what I want?

For example, now I was developing an news recommendations, which is exactly an session-based recommender, the train data is as following:

user_id | item_id-count | item_id-list | category-list -- | -- | -- | -- 0 | 2 | [2343, 16105] | [26, 281] 1 | 2 | [6678, 29121] | [133, 418] 2 | 2 | [3222, 17493] | [43, 297] 3 | 2 | [3222, 5196] | [43, 99] 4 | 2 | [3789, 4359] | [66, 67] ... | ... | ... | ... 199995 | 7 | [7592, 27484, 28222, 28702, 28515, 33114, 34315] | [136, 409, 412, 412, 412, 437, 442] 199996 | 13 | [1379, 7591, 13521, 13555, 13579, 13515, 13509... | [7, 136, 250, 250, 250, 250, 250, 409, 428, 43... 199997 | 2 | [22678, 22649] | [354, 354] 199998 | 20 | [23642, 24255, 23898, 24254, 24173, 24128, 254... | [375, 375, 375, 375, 375, 375, 389, 389, 395, ... 199999 | 11 | [472, 4245, 5215, 7591, 13521, 16531, 16619, 1... | [4, 67, 99, 136, 250, 281, 281, 281, 281, 297,...

200000 rows × 4 columns

rnyak commented 10 months ago

@ralgond NVTabular is easy to use. you can take our existing notebooks and just add your data in it.

if you want to use pandas sure, go for it. but then you need to use schema file yourself manually and correctly, so that you can train a TF4Rec model. Please note that NVTabular generates schema file automatically.

ralgond commented 10 months ago

ok, thank you, I would try to learn NVTabular.