amazon-archives / amazon-dsstne

Deep Scalable Sparse Tensor Network Engine (DSSTNE) is an Amazon developed library for building Deep Learning (DL) machine learning (ML) models
Apache License 2.0
4.41k stars 730 forks source link

collaborative filtering with neural networks? how is this implemented. #38

Open hadi-ds opened 8 years ago

hadi-ds commented 8 years ago

Hi fellows,

Thanks for the great work. I successfully applied this tool to a similar product recommendation task.

As for the example given, I am trying to understand how the three-layer neural network is used to tackle the MovieLens recommendation problem, which is a collaborative filtering task where amount of time users spend watching each movie is taken as an implicit rating. A common way of addressing this problem is matrix factorization to fill the blanks.

I have used neural networks for regression/classification tasks. However, it is not clear to me how movieLens problem can be addressed with neural networks. Is this based on some approach proposed in literature?

Another question is if dsstne can be applied for solving more generic ML tasks, such as a regression problem. Based on the example given, it seems like the existing predict API is only suitable for collaborative filtering type problems.

cheers,

scottlegrand commented 8 years ago

Awesome, one could switch the loss function to L2 to perform a regression. Beyond that, what would you need?

hadi-ds commented 8 years ago

What puzzles me is that, apart from changing the loss function, regression or classifications have essentially different objectives compared to collaborative filtering one given here.

To be concrete, let me give a simple example. Let's say I want to work with iris data, where the task is to classify iris plants (three kind: 0, 1, 2) given their petal and sepal length/width as input features. So, I have data like below for training:

sample_id    petal_l     petal_w     sepal_l     sepal_w     class
-----------------------------------------------------------------
1             5.5         2.6       4.4        1.2            2
2              6.          2.9       4.5        1.5            0
3             4.4         2.9       1.4        0.2            1
4             14         2.0       2.7        5.7            2
...

How can this data be expressed as in 'ml20m-all'? where does the class labels go?

This is basically a different problem compared to collaborative filtering one given in example, and simply replacing the penalty function doesn't seem to be enough.

Unlike the movielens data where features were sparse (with each user having watched few movies only), in this example feature values are known for all samples and the task is to predict labels (rather than missing features) for new data set.

rybakov commented 8 years ago

For the case of iris plants classification, you should change the cost function to softmax, so it will optimize a classifier: "one vs all" - (in training data only one class is correct). For example training data will looks like:

sample_id    petal_l     petal_w     sepal_l     sepal_w     class
-----------------------------------------------------------------
1          5.5         2.6          4.4        1.2           0 0 1
2          6.          2.9          4.5        1.5            1 0 0
3          4.4         2.9         1.4        0.2            0 1 0
4          14         2.0           2.7        5.7           0 0 1
 ...

For collaborative filtering you should use binary crossentropy (CrossEntropy or ScaledMarginalCrossEntropy depending on your use case) because in this case it will optimize multi-class classifier (in training data one or multiple classes are correct). So below example will looks like (in this case multiple classes are accepted as a target):

1             5.5         2.6       4.4        1.2            0 0 1
2              6.          2.9       4.5        1.5            1 0 1
3             4.4         2.9       1.4        0.2            0 1 1
4             14         2.0       2.7        5.7             0 1 1
 ...

Above example is a hypothetical (to show what kind of data will be processes with described cost function).

benitezmatias commented 8 years ago

@hadi-ds , please let us know if the example provided solves your problem. Otherwise we'll close this issue in the next 48 hrs

hadi-ds commented 8 years ago

@benitezmatias, @rgeorgej no it didn't. In fact, dsstne's data convertor (generateNetCDF) doesn't accept data in the format given in that answer. I was hoping to get an answer from dsstne developers.

I think it would be very helpful for data science community if some detail is provided about what dsstne actually does.

I dug further into how dsstne works and came to the conclusion that, in the form provided here, it basically constructs an auto-encoder network. It provides flexibility around the design of the network through the config file, but it is not meant to do general purpose ML task since auto-encoders can only reconstruct the data in an unsupervised fashion, input=output; they don't perform supervised learning, regression/classification.

scottlegrand commented 8 years ago

Can you give me a few lines of data in the format you wish to import into DSSTNE? It can train any network you want with as many input, output, and hidden layers as you wish. I can provide you with a simple C program to build a dataset from them

Got the product recommendation networks resemble autoencoders is because that's what worked but that doesn't mean that Destiny has in any way limited to running those sorts of Networks.

On Jul 14, 2016 11:41 AM, "Hadi Ebrahimnejad" notifications@github.com wrote:

@benitezmatias https://github.com/benitezmatias, @rgeorgej https://github.com/rgeorgej no I didn't. In fact, dsstne's data convertor (generateNetCDF) doesn't accept data in the format given in that answer. I was hoping to get an answer from dsstne developers.

I think it would be very helpful for data science community if some detail is provided about what dsstne actually does.

I dug further into how dsstne works and came to the conclusion that, in the form provided here, it basically constructs an auto-encoder network. It provides flexibility around the design of the network through the config file, but it is not meant to do general purpose ML task since auto-encoders can only reconstruct the data in an unsupervised fashion, input=output; they don't perform supervised learning, regression/classification.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/amznlabs/amazon-dsstne/issues/38#issuecomment-232721725, or mute the thread https://github.com/notifications/unsubscribe/ARNK9kTjHptb2dDzmVT629mw6MlVdltcks5qVmbVgaJpZM4Ixslq .

scottlegrand commented 8 years ago

PS everyone who has responded to you is a member of the DSSTNE team.

On Jul 14, 2016 12:06 PM, "Scott Le Grand" varelse2005@gmail.com wrote:

Can you give me a few lines of data in the format you wish to import into DSSTNE? It can train any network you want with as many input, output, and hidden layers as you wish. I can provide you with a simple C program to build a dataset from them

Got the product recommendation networks resemble autoencoders is because that's what worked but that doesn't mean that Destiny has in any way limited to running those sorts of Networks.

On Jul 14, 2016 11:41 AM, "Hadi Ebrahimnejad" notifications@github.com wrote:

@benitezmatias https://github.com/benitezmatias, @rgeorgej https://github.com/rgeorgej no I didn't. In fact, dsstne's data convertor (generateNetCDF) doesn't accept data in the format given in that answer. I was hoping to get an answer from dsstne developers.

I think it would be very helpful for data science community if some detail is provided about what dsstne actually does.

I dug further into how dsstne works and came to the conclusion that, in the form provided here, it basically constructs an auto-encoder network. It provides flexibility around the design of the network through the config file, but it is not meant to do general purpose ML task since auto-encoders can only reconstruct the data in an unsupervised fashion, input=output; they don't perform supervised learning, regression/classification.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/amznlabs/amazon-dsstne/issues/38#issuecomment-232721725, or mute the thread https://github.com/notifications/unsubscribe/ARNK9kTjHptb2dDzmVT629mw6MlVdltcks5qVmbVgaJpZM4Ixslq .

hadi-ds commented 8 years ago

@scottlegrand ok, to keep thing simple let's say I want to train a classifier on iris dataset, where input is 4 dimensional vectors and output is a class (out of three).

import numpy as np
from sklearn.utils import shuffle
from sklearn import datasets
iris = datasets.load_iris()
iris_X = iris.data
iris_y = iris.target
X, y = shuffle(iris_X, iris_y, random_state=1)

Here are the first 10 lines and X and y:

array([[ 5.8,  4. ,  1.2,  0.2],
       [ 5.1,  2.5,  3. ,  1.1],
       [ 6.6,  3. ,  4.4,  1.4],
       [ 5.4,  3.9,  1.3,  0.4],
       [ 7.9,  3.8,  6.4,  2. ],
       [ 6.3,  3.3,  4.7,  1.6],
       [ 6.9,  3.1,  5.1,  2.3],
       [ 5.1,  3.8,  1.9,  0.4],
       [ 4.7,  3.2,  1.6,  0.2],
       [ 6.9,  3.2,  5.7,  2.3]])

array([0, 1, 1, 0, 2, 1, 2, 0, 0, 2])

X can be put into format accepted by DSTNEE:

0   1,5.8:2,4.:3,1.2:4,0.2
1    1,5.1:2,2.5:3,3.:4,1.1
.
.

and so on, where I label features as 1, 2, 3 and 4.

Now, I can convert X in this form into netCDF format. What is not clear to me is how do I train a classifier using the converted data? where do 'y' labels go? do they need to be converted as well?

With features X and y properly converted, how does the 'train ...' and 'predict ...' commands should be modified in order to train to reconstruct y rather than the input X?

Another thing is that, for the MovieLens example provided, it seems like the timestamps are not actually being used. Instead, they are used to find out which movies are watched by a user (corresponding feature values set to 1) and which ones are not ( feature value set to 0). Is there a way to have generateNetCDF to keep the timestamps?

scottlegrand commented 8 years ago
  1. For the purposes of recommendations, the recommendations themselves are binary 1/0 decisions. Hence we predict view/not view rather than ratings.
  2. While we don't exploit timestamp data here, there's nothing stopping you from doing so. To that end, I'm going to write you a short C++ program to build your own datasets however you wish.
  3. Is there any standard format you'd like DSSTNE to import?
hadi-ds commented 8 years ago

@scottlegrand thanks for response.

That C++ code would be great, looking forward to it. There is information in those timestamps I don't want to waste.

Regarding the format: I am fine with the current format DSSTNE requires. Every line of my data includes only one item and its time stamp, but I wrote a wrapper that puts it in DSTNEE's format.

Finally, you mention that recs are binary decisions, but output of predict on unseen items are actually floats. I suppose one can interpret those values as ratings for sorting recommendations.

Schallerwf commented 8 years ago

I took the post from @rybakov above and tried to run a multi class classification on the iris dataset based on his suggestions. Here are the steps I am trying to prepare and train - https://github.com/Schallerwf/dsstne/blob/master/examples/ClassificationNetworkAttempt.md

I have attached a zip issue38.zip containing my input and output files as well as my config.json. I still can't seem to get proper results.

Feature 0 only affects the output for class 0. Same for features 1 and 2. Then feature 3 is ignored completely because their is no class 3. Obviously what I want is the prediction for each class to be based off all of the features.

Clearly I am missing something. I think a working example of this problem would be very helpful to new users looking to use DSSTNE. Thank you in advance for your time.

rybakov commented 8 years ago

I will have a look at issue38.zip and make an example folder with parser and commands to run (for train and test)

Schallerwf commented 7 years ago

Any update on this?