Open gabrielspmoreira opened 3 years ago
@gabrielspmoreira I saw that this PR was merged: https://github.com/NVIDIA/NVTabular/pull/911#event-4956796978
would that be a temporary solution for this FEA?
Thanks for the pointer Sara. The #911 might be a temporary solution for such use cases. We need to test it with PyTorch data loader, as the test provided use TF Data loader.
Hi, is this still not supported? I tried to run Transformers4Rec with multi-hot columns and got errors when trying to apply the workflow. When removing the multi-hot columns everything works fine.
Is your feature request related to a problem? Please describe. Currently NVTabular only supports 1-dimensional list columns, which is ok to support lists of categorical values (e.g. multi-hot features) or numerical values (e.g. pre-trained embeddings features).
For session-based recommendation or sequential recommendation, simple (non-list) features become 1D list features to represent the sequence of user interactions (e.g. item ids, product category, product price). And 1D list features (e.g. multi-hot or embeddings) should become 2D list features, which is currently not supported by NVTabular
Describe the solution you'd like NVTabular should be able to support processing, saving and data loading multi-dimensional list (sparse) columns, in order to support multi-hot and embeddings for session-based / sequential recommendation. The parquet format does support storing such multi-dimensional list columns and is not a limitation for that
Describe alternatives you've considered In some cases, to be able to use pre-trained embeddings with NVTabular (like in the SIGIR eCom 2021 Data Challenge, where they provide product description, product image and search query embeddings), I flattened the 2D (session interactions x embedding dim) features into 1D vector, saved to parquet with NVTabular, and reshaped back to 2D in the model side. But as product description/image vectors are not available for all products, I have to fill null vectors with zeroed vectors with the same size, so that when the 2D vectors are reconstructed in the model size their position are consistent with the other interaction features.