amazon-archives / amazon-dsstne

Deep Scalable Sparse Tensor Network Engine (DSSTNE) is an Amazon developed library for building Deep Learning (DL) machine learning (ML) models
Apache License 2.0
4.41k stars 730 forks source link

Low ranking accuracy of the example with MovieLens20M? #24

Closed saulvargas closed 8 years ago

saulvargas commented 8 years ago

Hi,

I've been playing around today with DSSTNE with the goal of running the example with MovieLens20M and compare the NN in the example with some state-of-the-art CF algorithms that I have implemented here. From my evaluation (which is by no means exhaustive or perfect) the example provided by DSSTNE does not seem to be competitive with respect to state of the art CF algorithms.

To summarise, I have downloaded the original MovieLens 20M dataset and I have performed a random 80%-20% partition. I have transformed the training subset to the DSSTNE format, with the only difference that I do not include the timestamps of the dataset, but 1's for all movies (is this actually very important??). I have generated recommendations with my CF algorithms (popularity, user-based and matrix factorisation) and, following the steps in the example, the predictions of DSSTNE. Finally, I have evaluated the performance with the testing subset using precision at cutoff 10.

These are the results, the configuration provided in your example does not seem to work very well: pop 0.10974162112149495 ub 0.24097987334078072 mf 0.25135912784469483 dsstne 0.056956854920365056

I am no expert in ANN's so I cannot figure out easily whether I should modify the parameters in the config.json provided in the example to make it work better. Have you compared the performance of the example with similar CF algorithms? If so, could you please share some results/insights?

Cheers Saúl

scottlegrand commented 8 years ago

Hey Saul, Nice work, I suspect the problem here is one of open sourcing the software without open sourcing the secret sauce networks actually in use. The network supplied is not one (as far as I know) that Amazon uses for recommendations, but rather a simple demo of a network one can build with DSSTNE.

One could say similar things of TensorFlow. Google has open sourced its very nice framework, but clearly not all their networks nor all their implementations of its underlying engine.

Scott

On Fri, May 20, 2016 at 8:17 AM, Saúl notifications@github.com wrote:

Hi,

I've been playing around today with DSSTNE with the goal of running the example with MovieLens20M and compare the NN in the example with some state-of-the-art CF algorithms that I have implemented here https://github.com/RankSys/RankSys. From my evaluation (which is by no means exhaustive or perfect) the example provided by DSSTNE does not seem to be competitive with respect to state of the art CF algorithms.

To summarise, I have downloaded the original MovieLens 20M dataset and I have performed a random 80%-20% partition. I have transformed the training subset to the DSSTNE format, with the only difference that I do not include the timestamps of the dataset, but 1's for all movies (is this actually very important??). I have generated recommendations with my CF algorithms (popularity, user-based and matrix factorisation) and, following the steps in the example, the predictions of DSSTNE. Finally, I have evaluated the performance with the testing subset using precision at cutoff 10.

These are the results, the configuration provided in your example does not seem to work very well: pop 0.10974162112149495 ub 0.24097987334078072 mf 0.25135912784469483 dsstne 0.056956854920365056

I am no expert in ANN's so I cannot figure out easily whether I should modify the parameters in the config.json provided in the example to make it work better. Have you compared the performance of the example with similar CF algorithms? If so, could you please share some results/insights?

Cheers Saúl

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/amznlabs/amazon-dsstne/issues/24

saulvargas commented 8 years ago

Hi Scott,

Thanks for your contribution.

Please let me clarify: I am not asking that's the exact configuration Amazon uses for their systems. I believe it would suffice providing one that performs well enough in a public dataset such as MovieLens 20M that uses, for instance, configurations found in papers such as this one.

It is awesome that Amazon releases code like this, I am just kindly requesting a little bit of guidance on how to make the provided example work.

Best wishes Saúl

scottlegrand commented 8 years ago

So one easy first step would be to add denoising to the example network, no?

Second, if you're willing, that's a very cool paper (I saw something like it at KDD 2015), and I'd love to help you implement it in DSSTNE if you're interested in doing so. And I suspect doing so would either demonstrate the flexibility of DSSTNE or help provide additional API hooks to provide such flexibility. Interested?

Scott

On Fri, May 20, 2016 at 8:39 AM, Saúl notifications@github.com wrote:

Hi Scott,

Thanks for your contribution.

Please let me clarify: I am not asking that's the exact configuration Amazon uses for their systems. I believe it would suffice providing one that performs well enough in a public dataset such as MovieLens 20M that uses, for instance, configurations found in papers such as this one http://dl.acm.org/citation.cfm?id=2835837.

It is awesome that Amazon releases code like this, I am just kindly requesting a little bit of guidance on how to make the provided example work.

Best wishes Saúl

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/amznlabs/amazon-dsstne/issues/24#issuecomment-220640796

scottlegrand commented 8 years ago

PS here's how to do that first step...

"Denoising" : {
    "p" : 0.3
},

On Fri, May 20, 2016 at 8:45 AM, Scott Le Grand varelse2005@gmail.com wrote:

So one easy first step would be to add denoising to the example network, no?

Second, if you're willing, that's a very cool paper (I saw something like it at KDD 2015), and I'd love to help you implement it in DSSTNE if you're interested in doing so. And I suspect doing so would either demonstrate the flexibility of DSSTNE or help provide additional API hooks to provide such flexibility. Interested?

Scott

On Fri, May 20, 2016 at 8:39 AM, Saúl notifications@github.com wrote:

Hi Scott,

Thanks for your contribution.

Please let me clarify: I am not asking that's the exact configuration Amazon uses for their systems. I believe it would suffice providing one that performs well enough in a public dataset such as MovieLens 20M that uses, for instance, configurations found in papers such as this one http://dl.acm.org/citation.cfm?id=2835837.

It is awesome that Amazon releases code like this, I am just kindly requesting a little bit of guidance on how to make the provided example work.

Best wishes Saúl

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/amznlabs/amazon-dsstne/issues/24#issuecomment-220640796

rgeorgej commented 8 years ago

@saulvargas is it possible to get us the scripts which you used to generate your test and train dataset

saulvargas commented 8 years ago

Hi @rgeorgej ,

Sure! I can prepare a simple repository with all required to reproduce the experiment I performed, although it might take me a couple of days... I'll let you know.

Cheers Saúl

scottlegrand commented 8 years ago

Hey Saul, I'd love to make a benchmark out of this for both training speed and predictive performance. Any progress here?

saulvargas commented 8 years ago

Hi,

I've been working in my spare time in a script and some Java code to fully reproduce the experiment I performed two weeks ago. It is still work in progress as I have to include the steps for DSSTNE, but meanwhile you can take a look here: https://github.com/saulvargas/dsstne-comparison/

Basically, if you execute run.sh, you download the original MovieLens20M dataset, perform a random 80/20 random split for training and test, generate some CF baselines with RankSys and the evaluate the precision@10 of these baselines.

Cheers Saúl

saulvargas commented 8 years ago

Hi @rgeorgej

Sorry it took so long, but now I have all the code in https://github.com/saulvargas/dsstne-comparison/ that is required to reproduce the experiments I conducted, now including training DSSTNE with my 80/20% split for MovieLens20M data. I hope it helps. Just execute the steps in run.sh (the ub recommender might take a while to be trained).

Cheers Saúl

rgeorgej commented 8 years ago

Thanks @saulvargas for all the help and we will take it to our side to get you a decent config with offline performance for the movie lens data

scottlegrand commented 8 years ago

So with a fairly simple autoencoder applied to an 80/20 split of the dataset, I get a precision of 8.75%@10. That's a far cry from your best efforts, but that's out of the gate so let's see where we can take it.

scottlegrand commented 8 years ago

So I was splitting 80/20 on users, you were splitting on movie views. I'll have that data for you by tomorrow. Got P@10 to 9.3% for that partitioning on the second try though so I suspect there's lots of headroom for improvement.

scottlegrand commented 8 years ago

With this fix, 32.7% P@10, 48.4% P@1. Will post to github tonight after work, but here's the first submission, incorporating input denoising and a sparseness penalty in the hidden layer:

{ "Version" : 0.8, "Name" : "MovieLens Benchmark #1", "Kind" : "FeedForward",
"SparsenessPenalty" : { "p" : 0.5, "beta" : 2.0 },

"ShuffleIndices" : false,

"Denoising" : {
    "p" : 0.4
},

"ScaledMarginalCrossEntropy" : {
    "oneTarget" : 1.0,
    "zeroTarget" : 0.0,
    "oneScale" : 30.0,
    "zeroScale" : 1.0
},
"Layers" : [
    { "Name" : "Input0", "Kind" : "Input", "N" : "auto", "DataSet" : "input", "Sparse" : true }, 
    { "Name" : "Hidden", "Kind" : "Hidden", "Type" : "FullyConnected", "Source" : [ "Input0" ], "N" : 256, "Activation" : "Sigmoid", "Sparse" : true },      
    { "Name" : "Output", "Kind" : "Output", "Type" : "FullyConnected", "Source" : [ "Hidden" ], "DataSet" : "output", "N" : "auto", "Activation" : "Sigmoid", "Sparse" : true }
],

"ErrorFunction" : "ScaledMarginalCrossEntropy"

}

scottlegrand commented 8 years ago

Round 2, MAP@10 of 41.1% and a P@10 of 35.3%. Network supplied below: { "Version" : 0.8, "Name" : "MovieLens Benchmark #2", "Kind" : "FeedForward",

"ShuffleIndices" : false,

"ScaledMarginalCrossEntropy" : {
    "oneTarget" : 1.0,
    "zeroTarget" : 0.0,
    "oneScale" : 1.0,
    "zeroScale" : 1.0
},
"Layers" : [
    { "Name" : "Input", "Kind" : "Input", "N" : "auto", "DataSet" : "input", "Sparse" : true }, 
    { "Name" : "Hidden1", "Kind" : "Hidden", "Type" : "FullyConnected", "Source" : "Input", "N" : 1536, "Activation" : "Relu", "Sparse" : false, "pDropout" : 0.5, "WeightInit" : { "Scheme" : "Gaussian", "Scale" : 0.01 } },
    { "Name" : "Hidden2", "Kind" : "Hidden", "Type" : "FullyConnected", "Source" : ["Hidden1"], "N" : 1536, "Activation" : "Relu", "Sparse" : false, "pDropout" : 0.5, "WeightInit" : { "Scheme" : "Gaussian", "Scale" : 0.01 } },  
    { "Name" : "Output", "Kind" : "Output", "Type" : "FullyConnected", "DataSet" : "output", "N" : "auto", "Activation" : "Sigmoid", "Sparse" : true , "WeightInit" : { "Scheme" : "Gaussian", "Scale" : 0.01, "Bias" : -10.2 }}
],

"ErrorFunction" : "ScaledMarginalCrossEntropy"

}

hadi-ds commented 8 years ago

Hi there,

I have a related issue about MovieLense example and the choice of ErrorFunction.

First, I wonder if during input/output data generation stage (generateNetCDF ...), time stamps (how long users watch a movie) are actually recorded in gl_input.nc file and used as an implicit measure of user-movie affinity, rather than replacing them as a label '1' indicating whether user has watched the movie or not?

If the former is the case, I think a regression type error function such as L2 should be used in this Auto Encoding. 'ScaledMarginalCrossEntropy' is relevant to classification setting (like the alternative scenario I mentioned above).

thanks,

saulvargas commented 8 years ago

Hi all,

Sorry it took me so long to get back to you with this.

Unfortunately I have not been successful at obtaining a decent ranking accuracy with any of the two last configurations that @scottlegrand kindly provided. They basically perform as bad as the original one for my evaluation methodology. I suspect we may be applying different evaluation protocols and, therefore, the provided configurations may not be adequate for the one I am interested about.

If I find some time, I will try to learn enough about ANNs so that I can understand how to come up with a configuration that results in high ranking accuracy for my setup. Therefore, I think we can close this issue now.

For your reference I am sharing the data I generated: https://www.dropbox.com/s/krk8mkzynn9igqv/dsstne-comparison-data.zip?dl=0

The code is already here: https://github.com/saulvargas/dsstne-comparison/

VedAustin commented 7 years ago

I was wondering if anyone ran this on the 100K dataset and evaluated its accuracy in terms of MSE? Are these benchmarks from DSSTNE publicly available?