amazon-archives / amazon-dsstne

Deep Scalable Sparse Tensor Network Engine (DSSTNE) is an Amazon developed library for building Deep Learning (DL) machine learning (ML) models
Apache License 2.0
4.41k stars 730 forks source link

how predict works #15

Open buptqitian opened 8 years ago

buptqitian commented 8 years ago

110510 26743 121019 26740 121017 26739 106401 26736 104307 26734 103010 26733 71300 26732 127445 26730 120839 26729 123188 26725

this is the features_input i guess the first col is movie id, the second col is index of features.

for ml-20m dataset, there are 138493 users, which means the size of input data is 138493 what is the feature of each input? could you give me an example? i guess it's a vector, dimension is 26744; each dimension stands for a movie id if user like this movie, the value will be 1; otherwise 0

the network has 3 layers the input layer size N is auto, so the N should be 26744? the hidden layer size N is 128, the output layer size N is auto(26744).

and if the network is trained, how predict works?

-l layer: (default = Output) the network layer to use for predictions

and i make two experiments,

the first do not use -l predict -b 1024 -d gl -i features_input -o features_output -k 10 -n gl.nc -f ml20m-all -s recs -r ml20m-all

second: use -l Hidden predict -b 1024 -d gl -i features_input -o features_output -l Hidden -k 10 -n gl.nc -f ml20m-all -s recs -r ml20m-all

the prediction result is the same, it seems "-l" does not work. And i check source code, in /src/amazon/utils/Predict.cpp, I could not find getOptionalArgValue of "-l"

and if i set -k to 1000, it will reminds Error :Optimized topk Only works for top 128

what predict really take for prediction, output layer's output(26744dimension)? and each value is a float between 0-1 am i right? Could you please give some papers about how this example works?

Thank you

buptqitian commented 8 years ago

I remove the code of Predict.cpp

if (topK >=128 ) { cout << "Error :Optimized topk Only works for top 128 . "<< topK<< " is greater" <<endl; return 1; }

then, I predict with "-k 140"

And i find the result like this: 1 1197,0.935:3793,0.896:2571,0.840:1206,0.814:551,0.810:2987,0.754:1347,0.723:1073,0.714:1274,0.714:608,0.690:1210,0.686:1220,0.683:1270,0.651:8874,0.621:5618,0.620:3578,0.617:968,0.615:1275,0.605:1339,0.603:741,0.603:353,0.598:2115,0.593:2710,0.593:6874,0.592:1527,0.585:1982,0.580:1127,0.562:527,0.559:3471,0.557:1345,0.556:3527,0.537:3114,0.528:553,0.498:5349,0.497:4886,0.496:110,0.488:1748,0.487:1,0.478:1407,0.475:1225,0.474:610,0.466:1288,0.465:1199,0.465:3703,0.449:594,0.436:1968,0.426:1265,0.421:778,0.416:6377,0.412:1148,0.394:235,0.391:1213,0.389:2797,0.386:1241,0.383:111,0.379:7360,0.372:70,0.367:2700,0.361:799,0.359:3300,0.357:750,0.357:1320,0.348:3917,0.341:1610,0.336:2355,0.323:2005,0.322:2997,0.321:1961,0.321:2455,0.317:7022,0.316:2003,0.314:3740,0.313:3671,0.310:2657,0.310:3751,0.302:4855,0.301:4865,0.298:2791,0.293:2028,0.292:16,0.288:1285,0.287:2916,0.283:6659,0.281:4262,0.278:3147,0.276:2139,0.270:5445,0.269:555,0.268:2161,0.266:5378,0.266:364,0.266:3994,0.264:1517,0.263:5782,0.263:1255,0.263:3702,0.260:1394,0.259:1617,0.255:442,0.254:1307,0.252:1974,0.250:7090,0.247:2788,0.247:5254,0.246:5219,0.246:2459,0.243:3018,0.242:745,0.241:6857,0.240:2167,0.234:1653,0.233:2160,0.233:273,0.233:720,0.230:724,0.230:7147,0.229:912,0.229:5971,0.227:40815,0.224:6,0.223:6283,0.219:2529,0.216:5502,0.215:1342,0.214:1101,0.213:2617,0.212:2513,0.211:2019,0.208:2,0.000:2,0.000:2,0.000:2,0.000:2,0.000:2,0.000:2,0.000:2,0.000:2,0.000:2,0.000:2,0.000:2,0.000:

It seems the last 12 prediction is no meaning

rgeorgej commented 8 years ago

We have an Performing Optimized topK in GPU which only supports upto 128. We need add support for larger numbers or fall back to other implementation(CUB) when the number is too high and we cannot extract the Performance of the GPU

scottlegrand commented 8 years ago

Optimized took should be able to be extended to 256, 512, and potentially 1024 before needing to fall back to cub. I can help Rejith with this. On May 17, 2016 2:56 PM, "Rejith Joseph" notifications@github.com wrote:

We have an Performing Optimized topK in GPU which only supports upto 128. We need add support for larger numbers or fall back to other implementation(CUB) when the number is too high and we cannot extract the Performance of the GPU

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/amznlabs/amazon-dsstne/issues/15#issuecomment-219818083