RUCAIBox / RecSysDatasets

This is a repository of public data sources for Recommender Systems (RS).
https://recbole.io/
785 stars 126 forks source link
atomic-files dataset recbole recommendation-datasets recommendations recommender-system

Datasets For Recommender Systems

This is a repository of public data sources for Recommender Systems (RS).

All of these recommendation datasets can convert to the atomic files defined in RecBole, which is a unified, comprehensive and efficient recommendation library.

After converting to the atomic files, you can use RecBole to test the performance of different recommender models on these datasets easily. For more information about RecBole, please refer to RecBole.

Usage

In order to use RecBole, you need to convert these original datasets to the atomic file which is a kind of data format defined by RecBole.

We provide two ways to convert these datasets into atomic files:

  1. Download the raw dataset and process it with conversion tools we provide in this repository. Please refer to conversion tools.

  2. Directly download the processed atomic files. Baidu Wangpan (Password: e272), Google Drive.

Datasets link and brief introduction

Shopping

Advertising

Check-in

Movies

Music

Books

Games

Anime

Pictures

Jokes

Exercises

Websites

Adult

News

Food

Beverages

Clothes

Datasets information statistics

General Datasets

SN Dataset #User #Item #Inteaction Sparsity Interaction Type TimeStamp User Context Item Context Interaction Context
1 MovieLens - - - - Rating
2 Anime 73,515 11,200 7,813,737 99.05% Rating
[-1, 1-10]
3 Epinions 116,260 41,269 188,478 99.99% Rating
[1-5]
4 Yelp
(5 versions)
- - - - Rating
[1-5]
5 Netflix 480,189 17,770 100,480,507 98.82% Rating
[1-5]
6 Book-Crossing 105,284 340,557 1,149,780 99.99% Rating
[0-10]
7 Jester 73,421 101 4,136,360 44.22% Rating
[-10, 10]
8 Douban 738,701 28 2,125,056 89.73% Rating
[0,5]
9 Yahoo Music 1,948,882 98,211 11,557,943 99.99% Rating
[0, 100]
10 KDD2010 - - - - Rating
11 Amazon
(2014 & 2018)
- - - - Rating
[0,5]
12 Pinterest 55,187 9,911 1,445,622 99.74% -
13 Gowalla 107,092 1,280,969 6,442,892 99.99% Check-in
14 Last.FM 1,892 17,632 92,834 99.72% Click
15 DIGINETICA 204,789 184,047 993,483 99.99% Click
16 Steam 2,567,538 32,135 7,793,069 99.99% Buy
17 Ta Feng 32,266 23,812 817,741 99.89% Click
18 Foursquare - - - - Check-in
19 Tmall 963,923 2,353,207 44,528,127 99.99% Click/Buy
20 YOOCHOOSE 9,249,729 52,739 34,154,697 99.99% Click/Buy
21 Retailrocket 1,407,580 247,085 2,756,101 99.99% View/Addtocart/Transaction
22 LFM-1b 120,322 3,123,496 1,088,161,692 99.71% Click
23 MIND - - - - Click
24 BeerAdvocate 33,388 66,055 1,586,614 99.9281% Rating
[0,5]
25 Behance 63,497 178,788 1,000,000 99.9912% Likes
26 DianPing 542,706 243,247 4,422,473 99.9967% Rating
[0,5]
27 EndoMondo 1,104 253,020 253,020 99.9094% Workout Logs
28 Food 226,570 231,637 1,132,367 99.9978% Rating
[0,5]
29 GoodReads 876,145 2,360,650 228,648,342 99.9889% Rating
[0,5]
30 KGRec - - - - Click
31 ModCloth 47,958 1,378 82,790 99.8747% Rating
[0,5]
32 RateBeer 29,265 110,369 2,924,163 99.9095% Overall Rating
[0,20]
33 RentTheRunway 105,571 5,850 192,544 99.9688% Rating
[0,10]
34 Twitch 15,524,309 6,161,666 474,676,929 99.9995% Click
35 Amazon_M2 3,606,349 1,410,675 15,306,183 - Click
36 Music4All-Onion 119,140 109,269 252,984,396 - Click

CTR Datasets

SN Dataset #User #Item #Inteaction Sparsity Interaction Type TimeStamp User Context Item Context Interaction Context
1 Criteo - - 45,850,617 - Click
2 Avazu - - 40,428,967 - Click
[0, 1]
3 iPinYou 19,731,660 163 24,637,657 99.23% View/Click
4 Phishing websites - - 11,055 -
5 Adult - - 32,561 - income>=50k
[0, 1]
6 Alibaba-iFashion 3,569,112 4,463,302 191,394,393 99.9988% Click
7 AliEC 491,647 240,130 1,366,056 99.9988% Click

Knowledge-aware Datasets

These knowledge-aware recommender datasets are based on KB4Rec, which associate items from recommender systems with entities from Freebase. Note that Amazon-book dataset is the version released in 2014.

Raw datasets information

SN Dataset #Items #Linked-Items #Users #Interactions
1 MovieLens 27,278 25,503 138,493 20,000,263
2 Amazon-book 2,370,605 108,515 8,026,324 22,507,155
3 LFM-1b (tracks) 31,634,450 1,254,923 120,322 319,951,294

After filtering by 5-core (And filter out the tracks that are listened to less than 10 times in LFM-1b)

SN Dataset #Items #Linked-Items #Users #Interactions
1 MovieLens 18,345 18,057 138,493 19,984,024
2 Amazon-book 367,982 34,476 603,668 8,898,041
3 LFM-1b (tracks) 615,823 337,349 79,133 15,765,756