abhijithch / RecSys.jl

Other
35 stars 11 forks source link

Possible Bug with Blobs recommendation & Some Instability #35

Open Skylion007 opened 7 years ago

Skylion007 commented 7 years ago

So I've been messing around with the Blob method of parallelization since I have a rather large dataset, I seem to have found a minor bug. When I try to call recommend on a trained ALSWR with 20 iterations and 20 factors, I get this error in the recommendation function.

ERROR: DimensionMismatch("new dimensions (1,20) must be consistent with array size 10")
 in reshape(::Array{Float64,1}, ::Tuple{Int64,Int64}) at .\array.jl:113
 in #recommend#10(::Bool, ::Int64, ::Function, ::RecSys.ALSWR{RecSys.ParBlob,RecSys.DistInputs,RecSys.DistModel}, ::Int6
4) at C:\Users\Skylion\.julia\v0.5\RecSys\src\als-wr.jl:142
 in recommend(::RecSys.ALSWR{RecSys.ParBlob,RecSys.DistInputs,RecSys.DistModel}, ::Int64) at C:\Users\Skylion\.julia\v0.5\RecSys\src\als-wr.jl:130
 in #recommend#32(::Array{Any,1}, ::Function, ::MovieRec, ::Int64, ::Vararg{Int64,N}) at C:\Users\Skylion\Documents\MalDump Data\maldump2\julia2\ALSAnime.jl:46
 in test_chunks(::String, ::String) at C:\Users\Skylion\Documents\MalDump Data\maldump2\julia2\ALSAnime.jl:150

On a side note, I've noticed the training of this library seems a little unstable. If I run the same parameters twice, I am likely to get two VERY different RMSEs (almost as if one doesn't converge). The odd thing is that the hyperparameters are the exact same. I wonder if there is an unsafe update of shared memory somewhere.

I will say that my matrix is actually relatively dense: 85078124 ratings in (1319751,4557) sized sparse matrix

I have been trying to debug this issue for a few months in my free time, but have yet to figure out what the issue could be. It could also be something weird like my CSVs aren't formatted properly. Does the order of the ratings or movies matter?

tanmaykm commented 7 years ago

Blobs.jl did unsafe update of mmapped files prior to https://github.com/JuliaLang/julia/pull/14885. But that should not be there with 0.4.7 and later. I also remember facing issues where errors in iterations were being ignored somewhere, maybe the results of pmap need to be checked for error. I had observed wide variations in rmse in these scenarios. I wonder if you are hitting one of these or maybe a new error.

The code filters out empty rows and columns here. (That's something we should change. Data cleaning should be an external step, and not be done in the train-recommend methods.) That may be causing the order of movies and ratings to matter. And also the DimensionMismatch exception in recommend.