Silleellie / VBPR-Reproducibility

Repository which includes a reproducibility experiment and two full e2e experiments considering the VBPR architecture described in the original paper by Prof. Julian McAuley of 2016
MIT License
0 stars 0 forks source link
recommender-system reproducibility side-information vbpr visual-algorithms

VBPR Reproducibility: comparison and end-to-end experiments with ClayRS can see

pylint

Repository which includes everything related to the paper Reproducibility Analysis of Recommender Systems relying on Visual Features: traps, pitfalls, and countermeasures

The following are the experiments that could be reproduced using this repository:

Check the 'Experiment pipeline' section for an overview of the operations carried out by the three different experiments

All the experiments provided in this repository are compliant with the proposed checklist:

Stage Check Value
Dataset Collection ✅ Link to a downloadable version of the dataset collection Tradesy raw feedback,
Image features binary file,
Tradesy Images from DVBPR dataset
✅ Any pre-filtering process performed on data $\forall$ experiment, duplicate interactions are removed and users with less than five interactions are not considered, script.
For Experiment 2 and Experiment 3, images from the Tradesy Images DVBPR dataset were removed in order to re-create the VBPR dataset (since original dataset is not accessible), script
✅ Relevant dataset statistics $\forall$ experiment, lines 18-27 of terminal output
✅ Preprocessing operations performed on side information Experiment 1: no preprocessing performed, visual features provided by original authors were used,
Experiment 2: lines 23-24, 42-47 of yaml report, lines 71-73, 83-86 of script,
Experiment 3: lines 21-34, 50-63 of yaml report, lines 64-67, 74-77 of script
✅ Pre-trained models adopted to represent side information bvlc_reference_caffenet,
resnet50,
vgg19
Data Splitting ✅ Protocol used for data partitioning and random seed to reproduce random splits Holdout $\forall$ user with test set size of one instance with random seed set at 42, script
⬜ Link to a downloadable version of the training/test/validation sets Train and test sets are not provided, but can be easily reproduced by running the main data pipeline , by setting the random state to 42
Recommendation ✅ Name and version of the framework containing the recommendation algorithm Clayrs can See (modified version of Clayrs v0.4),
Cornac v1.14.2
✅ Source code of the recommendation algorithm and setting of parameters Source code of the recommendation algorithm:
Clayrs can See VBPR,
Cornac VBPR

Parameters settings:
ClayRS can See: lines 61-70 of script,
Cornac: lines 102-121 of script
⬜ Method to select the best hyperparameters No hyperparameter tuning was carried out
✅ Any random seed necessary to reproduce random processes All random processes were set to random seed 42
Candidate Item Filtering ✅ Set of target items to generate a ranking All items of the system were taken into account
✅ Strategy (TestRatings, TestItems, TrainingItems, AllItems, One-Plus-Random) AllItems
Evaluation ✅ Name and version of the framework used to compute metrics Cornac framework for evaluating cornac models,
Custom AUC implementation to evaluate ClayRS model, lines of script: 64-118
✅ List of metrics adopted and cutoff for recommendation lists The only metric used was AUC, and all ranked items were taken into account to compute it
⬜ Normalization strategy adopted No normalization strategy was applied for the metric chosen (AUC)
✅ Averaging strategy adopted (e.g. micro or macro-average) System results were generated by performing macro-average over the user results, line 115 of script
✅ List of results in a standard format (per fold and overall) Experiment 1 AUC results path: reports/exp1,
Experiment 2 AUC results path: reports/exp2,
Experiment 3 AUC results path: reports/exp3
Statistical testing ✅ Data on which the test is performed Experiment 1: AUC results between ClayRS and Cornac for each epoch located at reports/exp1,
Experiment 2: AUC results between caffe and caffe_center_crop trained recommender for each epoch located at reports/exp2,
Experiment 3: AUC results between vgg19 and resnet50 trained recommender for each epoch located at reports/exp3
✅ Type of test and p-value ttest statistical test was used:
Experiment 1 p-value results path: reports/ttest_results/exp1,
Experiment 2 p-value results path: reports/ttest_results/exp2,
Experiment 3 p-value results path: reports/ttest_results/exp3
⬜ Corrections for multiple comparisons No correction was applied

How to Use

Simply execute pip install requirements.txt in a freshly created virtual environment.

The source code has been tested and results have been produced with python 3.9 and CUDA V11.6. Please note that CUDA must be installed to run the experiments.

To perform the exp1 experiment, which is the comparison of the VBPR implementation between ClayRS and Cornac, run via command line:

python pipeline.py -epo 5 10 20 50 -exp exp1

In this way, raw data will first be downloaded and processed, and then the actual experiment will be run using the default parameters.

To perform the exp2 experiment, which is the end-to-end experiment in which ClayRS can see is tested to include images as side information (using bvlc_reference_caffenet with two different pre-processing configurations), run via command line:

python pipeline.py -epo 10 20 -exp exp2

To perform the exp3 experiment, which is the end-to-end experiment in which ClayRS can see is tested using state-of-the-art models (vgg19 and resnet50) for extracting features from images, run via command line:

python pipeline.py -epo 10 20 -exp exp3

You can inspect all the parameters that can be set by simply running python pipeline.py –h. The following is what you would obtain:

$ python pipeline.py –h

usage: pipeline.py [-h] [-epo 5 [5 ...]] [-bs 128] [-gd 20] [-td 20] [-lr 0.005] [-seed 42] [-nt_ca 4] [-exp exp1]

Main script to reproduce the VBPR experiment

optional arguments:
  -h, --help            show this help message and exit
  -epo 5 [5 ...], --epochs 5 [5 ...]
                        Number of epochs for which the VBPR network will be trained
  -bs 128, --batch_size 128
                        Batch size that will be used for the torch dataloaders during training
  -gd 20, --gamma_dim 20
                        Dimension of the gamma parameter of the VBPR network
  -td 20, --theta_dim 20
                        Dimension of the theta parameter of the VBPR network
  -lr 0.005, --learning_rate 0.005
                        Learning rate for the VBPR network
  -seed 42, --random_seed 42
                        random seed
  -nt_ca 4, --num_threads_ca 4
                        Number of threads that will be used in ClayRS during Content Analyzer serialization phase
  -exp exp1, --experiment exp1
                        exp1 to perform the comparison experiment with Cornac,
                        exp2 to perform end to end experiment using caffe via ClayRS can see,
                        exp3 to perform end to end experiment using vgg19 and resnet50 via Clayrs can see

Experiment pipeline

The following is a description of the operations carried out by the pipeline depending on the experiment type (exp1, exp2, exp3) set by changing the -exp parameter

-exp exp1

Data:

Experiment and evaluation:

-exp exp2

Data:

Experiment and evaluation:

-exp exp3

Data:

Experiment and evaluation:

Project Organization

├── 📁 data                          <- Directory containing all data generated/used by both experiments
│   ├── 📁 interim                       <- Intermediate data that has been transformed
│   ├── 📁 processed                     <- The final, canonical data sets used for training
│   └── 📁 raw                           <- The original, immutable data dump
│
├── 📁 models                        <- Trained and serialized models at different epochs for the three experiments
│   ├── 📁 exp1                          <- Models which are output of the experiment 1
│   │   ├── 📁 vbpr_clayrs                   <- ClayRS models which are output of the experiment 1
│   │   └── 📁 vbpr_cornac                   <- Cornac models which are output of the experiment 1
│   │
│   ├── 📁 exp2                          <- Models which are output of the experiment 2
│   └── 📁 exp3                          <- Models which are output of the experiment 3
│
├── 📁 reports                       <- Generated metrics and reports by the three different experiments
│   ├── 📁 exp1                          <- System-wise and per-user AUC results output of the experiment 1
│   │   ├── 📁 vbpr_clayrs                   <- ClayRS AUC results which are output of the experiment 1
│   │   └── 📁 vbpr_cornac                   <- Cornac AUC results which are output of the experiment 1
│   │
│   ├── 📁 exp2                          <- System-wise and per-user AUC results output of the experiment 2
│   ├── 📁 exp3                          <- System-wise and per-user AUC results output of the experiment 3
│   ├── 📁 ttest_results                 <- Results of the ttest statistic for each epoch for all three experiments
│   │   ├── 📁 exp1                          <- ttest results output of the experiment 1
│   │   ├── 📁 exp2                          <- ttest results output of the experiment 2
│   │   └── 📁 exp3                          <- ttest results output of the experiment 3
│   │
│   ├── 📁 yaml_clayrs                   <- Reports generated by the Report class in ClayRS to document all techniques and parameters used in the experiments
│   │   ├── 📁 exp1_rs_report                <- Reports generated for each Recommender System configuration in the experiment 1
│   │   ├── 📁 exp2_rs_report                <- Reports generated for each Recommender System configuration in the experiment 2
│   │   ├── 📁 exp3_rs_report                <- Reports generated for each Recommender System configuration in the experiment 3
│   │   ├── 📄 exp1_ca_report.yml            <- Report generated for the Content Analyzer module in the experiment 1
│   │   ├── 📄 exp2_ca_report.yml            <- Report generated for the Content Analyzer module in the experiment 2
│   │   └── 📄 exp3_ca_report.yml            <- Report generated for the Content Analyzer module in the experiment 3
│   │
│   ├── 📄 exp1_terminal_output.txt      <- Output of the terminal which generated committed results for experiment 1
│   ├── 📄 exp2_terminal_output.txt      <- Output of the terminal which generated committed results for experiment 2
│   └── 📄 exp3_terminal_output.txt      <- Output of the terminal which generated committed results for experiment 3
│
├── 📁 src                           <- Source code of the project
│   ├── 📁 data                          <- Scripts to download and generate data
│   │   ├── 📄 create_interaction_csv.py
│   │   ├── 📄 create_tradesy_images_dataset.py
│   │   ├── 📄 dl_raw_sources.py
│   │   ├── 📄 extract_features_from_source.py
│   │   └── 📄 train_test_split.py
│   │
│   ├── 📁 evaluation                <- Scripts to evaluate models and compute ttest
│   │   ├── 📄 compute_auc.py
│   │   └── 📄 ttest.py
│   │
│   ├── 📁 model                     <- Scripts to train models
│   │   ├── 📄 exp1_clayrs_experiment.py
│   │   ├── 📄 exp1_cornac_experiment.py
│   │   ├── 📄 exp2_caffe.py
│   │   ├── 📄 exp3_vgg19_resnet.py
│   │   ├── 📄 clayrs_experiment.py
│   │   └── 📄 cornac_experiment.py
│   │
│   ├── 📄 __init__.py                   <- Makes src a Python module
│   └── 📄 utils.py                      <- Contains utils function for the project
│
├── 📄 LICENSE                       <- MIT License
├── 📄 pipeline.py                   <- Script that can be used to reproduce or customize the experiment pipeline
├── 📄 README.md                     <- The top-level README for developers using this project
└── 📄 requirements.txt              <- The requirements file for reproducing the analysis environment (src package)

Project based on the cookiecutter data science project template. #cookiecutterdatascience