DOsinga / deep_learning_cookbook

Deep Learning Cookbox
Apache License 2.0
689 stars 338 forks source link

Training Movie Recommendation System #20

Closed WillKoehrsen closed 5 years ago

WillKoehrsen commented 6 years ago

In notebook 4.2 Build a Recommender System the model is trained using mse as the loss function, treating the problem as regression with positive labels 1 and negative labels -1. Would labeling the negative examples 0 and training with a binary_crossentropy loss function (treating the problem as classification) be a valid approach?

Also, do we have to merge the "link" and "movie" embeddings using a dot product (Dot layer)? Could we instead concatenate the embeddings, so the shape would be [None, 100] and then add an additional fully connected layer to make a prediction? Something like


# Merge embeddings by concatenating
merged = Concatenate(name = 'merge', axis = 2)([link_embedding, movie_embedding])
merged = Reshape((1, ))(merged)

# Add fully connected layer for predictions
out = Dense(1, activation = None, name = 'output')(merged)
model = Model(inputs = [link, movie], outputs = out)

I understand that we're trying to create embeddings for movies so that "similar" movies have closer embeddings in terms of cosine distance. Embeddings make sense, but I guess my question is: are there different ways to train the network to produce the embeddings?

DOsinga commented 5 years ago

I guess we could try that too.

The intuition here is that you want to assign a vector to movie (the embedding) and we do this by using that embedding to predict something (the links). Movies with similar links would get a similar vector and so that's how the embedding should work.

Concatenating would probably create a better model to predict the link, but it might not create as good an embedding.