VBPR Reproducibility: comparison and end-to-end experiments with ClayRS can see

Repository which includes everything related to the paper Reproducibility Analysis of Recommender Systems relying on Visual Features: traps, pitfalls, and countermeasures

The following are the experiments that could be reproduced using this repository:

Experiment 1: comparing VBPR results
- Comparing the implementation of the VBPR algorithm between the modified version of ClayRS and Cornac
Experiment 2: Testing ClayRS Can See functionalities to include images as side information
- Performing an end-to-end experiment using the modified version of ClayRS with the pre-trained caffe reference model on different pre-processing configurations
Experiment 3: Testing state-of-the-art models for extracting features from images
- Performing an end-to-end experiment using the modified version of ClayRS with the pre-trained vgg19 and resnet50 models

Check the 'Experiment pipeline' section for an overview of the operations carried out by the three different experiments

All the experiments provided in this repository are compliant with the proposed checklist:

Stage	Check	Value
Dataset Collection	✅ Link to a downloadable version of the dataset collection	Tradesy raw feedback, Image features binary file, Tradesy Images from DVBPR dataset
	✅ Any pre-filtering process performed on data	$\forall$ experiment, duplicate interactions are removed and users with less than five interactions are not considered, script. For Experiment 2 and Experiment 3, images from the Tradesy Images DVBPR dataset were removed in order to re-create the VBPR dataset (since original dataset is not accessible), script
	✅ Relevant dataset statistics	$\forall$ experiment, lines 18-27 of terminal output
	✅ Preprocessing operations performed on side information	Experiment 1: no preprocessing performed, visual features provided by original authors were used, Experiment 2: lines 23-24, 42-47 of yaml report, lines 71-73, 83-86 of script, Experiment 3: lines 21-34, 50-63 of yaml report, lines 64-67, 74-77 of script
	✅ Pre-trained models adopted to represent side information	bvlc_reference_caffenet, resnet50, vgg19
Data Splitting	✅ Protocol used for data partitioning and random seed to reproduce random splits	Holdout $\forall$ user with test set size of one instance with random seed set at 42, script
Data Splitting	⬜ Link to a downloadable version of the training/test/validation sets	Train and test sets are not provided, but can be easily reproduced by running the main data pipeline , by setting the random state to 42
Recommendation	✅ Name and version of the framework containing the recommendation algorithm	Clayrs can See (modified version of Clayrs v0.4), Cornac v1.14.2
	✅ Source code of the recommendation algorithm and setting of parameters	Source code of the recommendation algorithm: Clayrs can See VBPR, Cornac VBPR Parameters settings: ClayRS can See: lines 61-70 of script, Cornac: lines 102-121 of script
	⬜ Method to select the best hyperparameters	No hyperparameter tuning was carried out
	✅ Any random seed necessary to reproduce random processes	All random processes were set to random seed 42
Candidate Item Filtering	✅ Set of target items to generate a ranking	All items of the system were taken into account
Candidate Item Filtering	✅ Strategy (TestRatings, TestItems, TrainingItems, AllItems, One-Plus-Random)	AllItems
Evaluation	✅ Name and version of the framework used to compute metrics	Cornac framework for evaluating cornac models, Custom AUC implementation to evaluate ClayRS model, lines of script: 64-118
	✅ List of metrics adopted and cutoff for recommendation lists	The only metric used was AUC, and all ranked items were taken into account to compute it
	⬜ Normalization strategy adopted	No normalization strategy was applied for the metric chosen (AUC)
	✅ Averaging strategy adopted (e.g. micro or macro-average)	System results were generated by performing macro-average over the user results, line 115 of script
	✅ List of results in a standard format (per fold and overall)	Experiment 1 AUC results path: `reports/exp1`, Experiment 2 AUC results path: `reports/exp2`, Experiment 3 AUC results path: `reports/exp3`
Statistical testing	✅ Data on which the test is performed	Experiment 1: AUC results between ClayRS and Cornac for each epoch located at `reports/exp1`, Experiment 2: AUC results between caffe and caffe_center_crop trained recommender for each epoch located at `reports/exp2`, Experiment 3: AUC results between vgg19 and resnet50 trained recommender for each epoch located at `reports/exp3`
	✅ Type of test and p-value	ttest statistical test was used: Experiment 1 p-value results path: `reports/ttest_results/exp1`, Experiment 2 p-value results path: `reports/ttest_results/exp2`, Experiment 3 p-value results path: `reports/ttest_results/exp3`
	⬜ Corrections for multiple comparisons	No correction was applied

How to Use

Simply execute pip install requirements.txt in a freshly created virtual environment.

The source code has been tested and results have been produced with python 3.9 and CUDA V11.6. Please note that CUDA must be installed to run the experiments.

To perform the exp1 experiment, which is the comparison of the VBPR implementation between ClayRS and Cornac, run via command line:

python pipeline.py -epo 5 10 20 50 -exp exp1

In this way, raw data will first be downloaded and processed, and then the actual experiment will be run using the default parameters.

By default, the experiment is run for $5$, $10$, $20$ and $50$ epochs. Default parameters can be easily changed by passing them as command line arguments

To perform the exp2 experiment, which is the end-to-end experiment in which ClayRS can see is tested to include images as side information (using bvlc_reference_caffenet with two different pre-processing configurations), run via command line:

python pipeline.py -epo 10 20 -exp exp2

The experiment was performed by setting 10 and 20 epochs using the epo parameter, however any number of epochs can be specified

To perform the exp3 experiment, which is the end-to-end experiment in which ClayRS can see is tested using state-of-the-art models (vgg19 and resnet50) for extracting features from images, run via command line:

python pipeline.py -epo 10 20 -exp exp3

You can inspect all the parameters that can be set by simply running python pipeline.py –h. The following is what you would obtain:

$ python pipeline.py –h

usage: pipeline.py [-h] [-epo 5 [5 ...]] [-bs 128] [-gd 20] [-td 20] [-lr 0.005] [-seed 42] [-nt_ca 4] [-exp exp1]

Main script to reproduce the VBPR experiment

optional arguments:
  -h, --help            show this help message and exit
  -epo 5 [5 ...], --epochs 5 [5 ...]
                        Number of epochs for which the VBPR network will be trained
  -bs 128, --batch_size 128
                        Batch size that will be used for the torch dataloaders during training
  -gd 20, --gamma_dim 20
                        Dimension of the gamma parameter of the VBPR network
  -td 20, --theta_dim 20
                        Dimension of the theta parameter of the VBPR network
  -lr 0.005, --learning_rate 0.005
                        Learning rate for the VBPR network
  -seed 42, --random_seed 42
                        random seed
  -nt_ca 4, --num_threads_ca 4
                        Number of threads that will be used in ClayRS during Content Analyzer serialization phase
  -exp exp1, --experiment exp1
                        exp1 to perform the comparison experiment with Cornac,
                        exp2 to perform end to end experiment using caffe via ClayRS can see,
                        exp3 to perform end to end experiment using vgg19 and resnet50 via Clayrs can see

Experiment pipeline

The following is a description of the operations carried out by the pipeline depending on the experiment type (exp1, exp2, exp3) set by changing the -exp parameter

-exp exp1

Data:

Download raw tradesy feedback from here
Download binary file containing features of images from here
Filter raw interactions following original VBPR paper instructions and remove duplicate interactions
Extract into a npy matrix features from the binary file for items which appear in the filtered interactions
Build item map (following the order in which each item appears in the binary file)
Build train and test set with leave-one-out using -seed parameter as random state
Build user map (following the order in which each user appears in the filtered interactions)

Experiment and evaluation:

Fit VBPR algorithm via ClayRS can see and Cornac using command line arguments when invoking pipeline.py (-epo, -bs, -gd, etc.)
Compute AUC of each user and the average AUC for both ClayRS and Cornac
Perform ttest statistical test between ClayRS user results and Cornac user results

-exp exp2

Data:

Download raw tradesy feedback from here
Download npy file containing tradesy images from here
Download caffe model and all of its necessary files:
- bvlc_reference_caffenet model from here
- deploy.prototxt for bvlc_reference_caffenet from here
- ilsvrc_2012_mean.npy file containing mean pixel value from here
Filter raw interactions following original VBPR paper instructions and remove duplicate interactions
Download binary file containing features of images from here
Extract into a npy matrix features from the binary file for items which appear in the filtered interactions
Build item map (following the order in which each item appears in the binary file)
Extract from the npy file into a folder the images of the items which appear in the filtered interactions
Build a .csv file associating each item to the path of its image in said directory
Build train and test set with leave-one-out using -seed parameter as random state
Build user map (following the order in which each user appears in the filtered interactions)

Experiment and evaluation:

From the images dataset, create processed contents using the Content Analyzer. Each serialized content (corresponding to an item) will have two different representations:
- caffe: same model as the one used in the VBPR paper (and pre-processing operations suggested for the model by the Caffe framework from here)
- caffe_center_crop: same configuration, but only center crop to 227x227 dimensions is applied as pre-processing operation
Fit a different VBPR algorithm for the two representations via ClayRS can see using command line arguments when invoking pipeline.py (-epo, -bs, -gd, etc.)
Compute AUC of each user and the average AUC for ClayRS for each VBPR algorithm instance
Perform ttest statistical test between the two configurations

-exp exp3

Data:

Download raw tradesy feedback from here
Download npy file containing tradesy images from here
Filter raw interactions following original VBPR paper instructions and remove duplicate interactions
Download binary file containing features of images from here
Extract into a npy matrix features from the binary file for items which appear in the filtered interactions
Build item map (following the order in which each item appears in the binary file)
Extract from the npy file into a folder the images of the items which appear in the filtered interactions
Build a .csv file associating each item to the path of its image in said directory
Build train and test set with leave-one-out using -seed parameter as random state
Build user map (following the order in which each user appears in the filtered interactions)

Experiment and evaluation:

From the images dataset, create processed contents using the Content Analyzer. Each serialized content (corresponding to an item) will have two different representations:
- resnet50: features are extracted from the pool5 layer of the ResNet50 architecture
- vgg19: features are extracted from the last convolution layer before the fully-connected ones of the vgg19 architecture and global max-pooling is applied to them
Fit a different VBPR algorithm for the two representations via ClayRS can see using command line arguments when invoking pipeline.py (-epo, -bs, -gd, etc.)
Compute AUC of each user and the average AUC for ClayRS for each VBPR algorithm instance
Perform ttest statistical test between the two configurations

Project Organization

├── 📁 data                          <- Directory containing all data generated/used by both experiments
│   ├── 📁 interim                       <- Intermediate data that has been transformed
│   ├── 📁 processed                     <- The final, canonical data sets used for training
│   └── 📁 raw                           <- The original, immutable data dump
│
├── 📁 models                        <- Trained and serialized models at different epochs for the three experiments
│   ├── 📁 exp1                          <- Models which are output of the experiment 1
│   │   ├── 📁 vbpr_clayrs                   <- ClayRS models which are output of the experiment 1
│   │   └── 📁 vbpr_cornac                   <- Cornac models which are output of the experiment 1
│   │
│   ├── 📁 exp2                          <- Models which are output of the experiment 2
│   └── 📁 exp3                          <- Models which are output of the experiment 3
│
├── 📁 reports                       <- Generated metrics and reports by the three different experiments
│   ├── 📁 exp1                          <- System-wise and per-user AUC results output of the experiment 1
│   │   ├── 📁 vbpr_clayrs                   <- ClayRS AUC results which are output of the experiment 1
│   │   └── 📁 vbpr_cornac                   <- Cornac AUC results which are output of the experiment 1
│   │
│   ├── 📁 exp2                          <- System-wise and per-user AUC results output of the experiment 2
│   ├── 📁 exp3                          <- System-wise and per-user AUC results output of the experiment 3
│   ├── 📁 ttest_results                 <- Results of the ttest statistic for each epoch for all three experiments
│   │   ├── 📁 exp1                          <- ttest results output of the experiment 1
│   │   ├── 📁 exp2                          <- ttest results output of the experiment 2
│   │   └── 📁 exp3                          <- ttest results output of the experiment 3
│   │
│   ├── 📁 yaml_clayrs                   <- Reports generated by the Report class in ClayRS to document all techniques and parameters used in the experiments
│   │   ├── 📁 exp1_rs_report                <- Reports generated for each Recommender System configuration in the experiment 1
│   │   ├── 📁 exp2_rs_report                <- Reports generated for each Recommender System configuration in the experiment 2
│   │   ├── 📁 exp3_rs_report                <- Reports generated for each Recommender System configuration in the experiment 3
│   │   ├── 📄 exp1_ca_report.yml            <- Report generated for the Content Analyzer module in the experiment 1
│   │   ├── 📄 exp2_ca_report.yml            <- Report generated for the Content Analyzer module in the experiment 2
│   │   └── 📄 exp3_ca_report.yml            <- Report generated for the Content Analyzer module in the experiment 3
│   │
│   ├── 📄 exp1_terminal_output.txt      <- Output of the terminal which generated committed results for experiment 1
│   ├── 📄 exp2_terminal_output.txt      <- Output of the terminal which generated committed results for experiment 2
│   └── 📄 exp3_terminal_output.txt      <- Output of the terminal which generated committed results for experiment 3
│
├── 📁 src                           <- Source code of the project
│   ├── 📁 data                          <- Scripts to download and generate data
│   │   ├── 📄 create_interaction_csv.py
│   │   ├── 📄 create_tradesy_images_dataset.py
│   │   ├── 📄 dl_raw_sources.py
│   │   ├── 📄 extract_features_from_source.py
│   │   └── 📄 train_test_split.py
│   │
│   ├── 📁 evaluation                <- Scripts to evaluate models and compute ttest
│   │   ├── 📄 compute_auc.py
│   │   └── 📄 ttest.py
│   │
│   ├── 📁 model                     <- Scripts to train models
│   │   ├── 📄 exp1_clayrs_experiment.py
│   │   ├── 📄 exp1_cornac_experiment.py
│   │   ├── 📄 exp2_caffe.py
│   │   ├── 📄 exp3_vgg19_resnet.py
│   │   ├── 📄 clayrs_experiment.py
│   │   └── 📄 cornac_experiment.py
│   │
│   ├── 📄 __init__.py                   <- Makes src a Python module
│   └── 📄 utils.py                      <- Contains utils function for the project
│
├── 📄 LICENSE                       <- MIT License
├── 📄 pipeline.py                   <- Script that can be used to reproduce or customize the experiment pipeline
├── 📄 README.md                     <- The top-level README for developers using this project
└── 📄 requirements.txt              <- The requirements file for reproducing the analysis environment (src package)

Project based on the cookiecutter data science project template. #cookiecutterdatascience

Silleellie / VBPR-Reproducibility

readme

VBPR Reproducibility: comparison and end-to-end experiments with ClayRS can see

How to Use

Experiment pipeline

-exp exp1

-exp exp2

-exp exp3

Project Organization