Training Movie Recommendation System

In notebook 4.2 Build a Recommender System the model is trained using mse as the loss function, treating the problem as regression with positive labels 1 and negative labels -1. Would labeling the negative examples 0 and training with a binary_crossentropy loss function (treating the problem as classification) be a valid approach?

Also, do we have to merge the "link" and "movie" embeddings using a dot product (Dot layer)? Could we instead concatenate the embeddings, so the shape would be [None, 100] and then add an additional fully connected layer to make a prediction? Something like


# Merge embeddings by concatenating
merged = Concatenate(name = 'merge', axis = 2)([link_embedding, movie_embedding])
merged = Reshape((1, ))(merged)

# Add fully connected layer for predictions
out = Dense(1, activation = None, name = 'output')(merged)
model = Model(inputs = [link, movie], outputs = out)

I understand that we're trying to create embeddings for movies so that "similar" movies have closer embeddings in terms of cosine distance. Embeddings make sense, but I guess my question is: are there different ways to train the network to produce the embeddings?

DOsinga / deep_learning_cookbook

Training Movie Recommendation System #20