HKUDS / MMSSL

[WWW'2023] "MMSSL: Multi-Modal Self-Supervised Learning for Recommendation"
https://arxiv.org/abs/2302.10632
164 stars 21 forks source link

Raw dataset processing details #12

Closed infusion-zero-edit closed 1 year ago

infusion-zero-edit commented 1 year ago

Can you detail on how you preprocess the raw data into V/T/A features, which stored in *npy. Only textual features is mentioned in your paper.

weiwei1206 commented 1 year ago

Details are as mentioned in the article, part of the data is from the original competition. We have separately processed two multimodal recommendation datasets, which will be released including visual posters/pictures, original textual information, and preprocessed interaction and feature data in the current pipeline format that can be directly used. The textual information in this article is inherent to the dataset itself, while the visual information is crawled from web pages. The feature data includes both regular extractors and a ChatGPT version. The new data containing the original modal information will be released in our future work. Please stay tuned for updates.