Open YeziPeter opened 3 years ago
Thanks for pointing out that code @YeziPeter , it needs more comments describing what happens at that point. Had to dig through to find an answer for that one :)
In the JokeRec.load_data()
method where the data gets loaded, these ratings are scaled in advance. The raw data ranges [-10, 10]
but the scaled data ranges [-1.0, 1.0]
Then these scaled rating values get used as the sample
data for the clustering.
The c[item]
value is the cluster center for ratings of a particular item (an individual joke), not a number of users who've rated an item. This is scaled the same way as the rating
values.
Does that help?
In ray-rllib/recsys/01-Ressys, i found there may be a problem in calculate the distance between user's ratings to cluster's centers. It is in the step function in env class JokeRec: scaled_diff = abs(c[item] - rating) / 2.0 The shape for c (which is centers[i]) is 1* 24983, stands for the features in i cluster. However, item is is randomly chosen from the cluster, and the range is [0, 99]. The rest [100, 24983] in the center[i] cannot be searched. Is c[item] - rating a correct way to calculate that distance?