RaymondLi0 / conversational-recommendations

70 stars 21 forks source link

movie-dialogue-dev

This repository contains the code for NeurIPS 2018 paper "Towards Deep Conversational Recommendations" https://arxiv.org/abs/1812.07617

Requirements

Usage

Get the data

Get ReDial data from https://github.com/ReDialData/website/tree/data and Movielens data https://grouplens.org/datasets/movielens/latest/. Note that for the paper we retrieved the Movielens data set in September 2017. The Movielens latest dataset has been updated since then.

git clone https://github.com/RaymondLi0/conversational-recommendations.git
cd conversational-recommendations
pip install -r requirements.txt
python -m nltk.downloader punkt

mkdir -p redial movielens
wget -O redial/redial_dataset.zip https://github.com/ReDialData/website/raw/data/redial_dataset.zip
wget -O movielens/ml-latest.zip http://files.grouplens.org/datasets/movielens/ml-latest.zip
# split ReDial data
python scripts/split-redial.py redial/
mv redial/test_data.jsonl redial/test_data
# split Movielens data
python scripts/split-movielens.py movielens/

Merge the movie lists by matching the movie names from ReDial and Movielens. Note that this will create an intermediate file movies_matched.csv, which is deleted at the end of the script.

python scripts/match_movies.py --redial_movies_path=redial/movies_with_mentions.csv --ml_movies_path=movielens/ml-latest/movies.csv --destination=redial/movies_merged.csv

Specify the paths

In the config.py file, specify the different paths to use:

Get GenSen pre-trained models

Get GenSen pre-trained models from https://github.com/Maluuba/gensen. More precisely, you will need the embeddings in the /path/to/models/embeddings folder, and the following model files: nli_large_vocab.pkl, nli_large.model in the /path/to/models/GenSen folder

cd /path/to/models
mkdir GenSen embeddings
wget -O GenSen/nli_large_vocab.pkl https://genseniclr2018.blob.core.windows.net/models/nli_large_vocab.pkl
wget -O GenSen/nli_large.model https://genseniclr2018.blob.core.windows.net/models/nli_large.model
cd embeddings
wget https://raw.githubusercontent.com/Maluuba/gensen/master/data/embedding/glove2h5.py
wget https://github.com/Maluuba/gensen/raw/master/data/embedding/glove2h5.sh
sh glove2h5.sh
cd /path/to/project_dir

Train models

Generate sentences

generate_responses.py loads a trained model. It takes real dialogues from the ReDial dataset and lets the model generate responses whenever the human recommender speaks (responses are conditioned on the current dialogue history).

python generate_responses.py --model_path=/path/to/models/recommender/model_best --save_path=generations