Closed Bilge closed 6 years ago
This is an SQL query executing the algorithm in this post, with the n / n_max
weighting. It could be interesting to see the different results of different algorithms and weights.
SELECT
app_name,
total_reviews,
positive_reviews,
negative_reviews,
(total_reviews*1.0 / votes.max_votes) * (positive_reviews*1.0 / total_reviews) + (1 - (total_reviews*1.0 / votes.max_votes)) * scores.avg_score AS weighted_score
FROM
review,
(
SELECT AVG(score) AS avg_score
FROM
(
SELECT
positive_reviews*1.0 / total_reviews AS score
FROM
review
) scores
) scores,
(
SELECT MAX(vote) AS max_votes
FROM
(
SELECT
total_reviews AS vote
FROM
review
) votes
) votes
ORDER BY weighted_score DESC
LIMIT 250;
Hey @rubenpieters that's super helpful! Let me see if I can get a clone of the site up running this algorithm so we can discuss the differences.
I'm not sure if it's necessary to completely changes the algorithm, I was thinking we could just adjust the weights of the current algorithm, but it will be interesting to see the difference anyway. Who knows, maybe this turns out to be exactly what we need.
Here's the new top 20 I'm seeing.
Title | Reviews | Positive | Negative | Score |
---|---|---|---|---|
Counter-Strike: Global Offensive | 2215644 | 1999455 | 216189 | 0.90242611177608 |
Dota 2 | 808378 | 711271 | 97107 | 0.80254662214764 |
Team Fortress 2 | 473338 | 444344 | 28994 | 0.796713524112741 |
Garry's Mod | 307098 | 293864 | 13234 | 0.785678837527148 |
Unturned | 290821 | 265235 | 25586 | 0.778327041008859 |
Terraria | 197138 | 191295 | 5843 | 0.777010777597263 |
The Elder Scrolls V: Skyrim | 233903 | 219160 | 14743 | 0.777007371870988 |
Left 4 Dead 2 | 195686 | 189509 | 6177 | 0.776701522441416 |
Rocket League | 175045 | 162039 | 13006 | 0.771366054821586 |
The Witcher® 3: Wild Hunt | 114765 | 111890 | 2875 | 0.769358018594371 |
Euro Truck Simulator 2 | 112551 | 109303 | 3248 | 0.768947976901211 |
Portal 2 | 104207 | 102849 | 1358 | 0.768890121584016 |
Borderlands 2 | 113920 | 110152 | 3768 | 0.768862730303245 |
Sid Meier's Civilization® V | 114591 | 110504 | 4087 | 0.768792004437681 |
Warframe | 148314 | 135122 | 13192 | 0.768363992091475 |
PAYDAY 2 | 318979 | 263537 | 55442 | 0.767925846045087 |
Counter-Strike | 97991 | 95813 | 2178 | 0.76784145066419 |
Life is Strange - Episode 1 | 98111 | 94660 | 3451 | 0.767279999789437 |
Undertale | 85355 | 81684 | 3671 | 0.765788185375327 |
Paladins | 139548 | 120403 | 19145 | 0.764720240155128 |
This list definitely seems to favour more classics. That's probably a good thing.
It seems we can simplify:
(
SELECT MAX(vote) AS max_votes
FROM
(
SELECT
total_reviews AS vote
FROM
review
) votes
) votes
down to:
(
SELECT MAX(total_reviews) AS max_votes FROM review
) votes
...because the original statement fragment is effectively just renaming a column.
I wonder if there are any more optimizations we can make? I'm not very familiar with SQL.
Indeed, it's also possible to collapse both the max and avg into one subquery:
SELECT
app_name,
total_reviews,
positive_reviews,
negative_reviews,
(total_reviews*1.0 / agg.max_votes) * (positive_reviews*1.0 / total_reviews) + (1 - (total_reviews*1.0 / agg.max_votes)) * agg.avg_score AS weighted_score
FROM
review,
(
SELECT
AVG(positive_reviews*1.0 / total_reviews) AS avg_score,
MAX(total_reviews) AS max_votes
FROM review
) agg
ORDER BY weighted_score DESC
LIMIT 250;
Is it possible to move the embedded select statement into the SELECT
portion of the root query, instead of having it in the FROM
portion? This would make it a bit easier to split the query up in the code. Maybe I'm just being stupid 😆
I don't think it is possible to do that, since you cannot use an alias created in a SELECT clause. Also it might execute the subquery for every item in the row, I'm not sure if that would be optimized away.
Anyway, here is a tweakable version of my earlier query:
SELECT
app_name,
total_reviews,
positive_reviews,
negative_reviews,
CASE
WHEN (total_reviews * 15.0 / agg.max_votes) > 1
THEN (positive_reviews*1.0 / total_reviews)
ELSE (total_reviews*15.0 / agg.max_votes) * (positive_reviews*1.0 / total_reviews) + (1 - (total_reviews*15.0 / agg.max_votes)) * agg.avg_score
END weighted_score
FROM
review,
(
SELECT
AVG(positive_reviews*1.0 / total_reviews) AS avg_score,
MAX(total_reviews) AS max_votes
FROM review
) agg
ORDER BY weighted_score DESC
LIMIT 25;
If you place 1.0
instead of the 15.0
s you get the old version, and you can increase the parameter to make total amount of votes matter less.
For Bayesian 15 here's the top 20 I'm seeing.
Title | Reviews | Positive | Negative | Score |
---|---|---|---|---|
Terraria | 197138 | 191295 | 5843 | 0.970360863963315 |
Left 4 Dead 2 | 195686 | 189509 | 6177 | 0.968434124055885 |
Garry's Mod | 307098 | 293864 | 13234 | 0.956906264449785 |
Team Fortress 2 | 473338 | 444344 | 28994 | 0.938745674338422 |
The Elder Scrolls V: Skyrim | 233903 | 219160 | 14743 | 0.936969598508784 |
The Witcher® 3: Wild Hunt | 114765 | 111890 | 2875 | 0.926589719203611 |
Rocket League | 175045 | 162039 | 13006 | 0.925699105944186 |
Euro Truck Simulator 2 | 112551 | 109303 | 3248 | 0.92043909380621 |
Portal 2 | 104207 | 102849 | 1358 | 0.919571264048282 |
Borderlands 2 | 113920 | 110152 | 3768 | 0.919160394836724 |
Sid Meier's Civilization® V | 114591 | 110504 | 4087 | 0.918099506853252 |
Unturned | 290821 | 265235 | 25586 | 0.912021484005625 |
Warframe | 148314 | 135122 | 13192 | 0.911053575522203 |
Counter-Strike | 97991 | 95813 | 2178 | 0.903841200250886 |
Counter-Strike: Global Offensive | 2215644 | 1999455 | 216189 | 0.90242611177608 |
Life is Strange - Episode 1 | 98111 | 94660 | 3451 | 0.895419437129596 |
Dota 2 | 808378 | 711271 | 97107 | 0.879874266741549 |
Undertale | 85355 | 81684 | 3671 | 0.873042220917944 |
Paladins | 139548 | 120403 | 19145 | 0.857023042614962 |
Stardew Valley | 68186 | 66250 | 1936 | 0.85667419610521 |
Interestingly, the top 20 still mostly comprises similar titles to the Bayesian 1 list, just in a slightly different order. It seems the characteristics of this algorithm are very different from the Wilson.
I would be very interested, if you are able, to try tweaking Wilson's algorithm in a similar manner to see how it changes.
If you try with higher numbers than 15 (try 25, 50, 100, ..) then you start seeing more and more niche titles in the top list.
The other scores are more difficult to tweak and I would have to read on the underlying math to understand it more. I'll have to see if I can free some time to do that.
I'm working on producing a mini site to showcase all the different algorithms and weights based on the same data set. When we have finished we'll let people vote on which algorithm (and weight) they think makes the best top list 👍
I've received some very interesting notes from a friend about the remaining three algorithms. Hopefully I've interpreted them correctly.
Either replace rating
with this, or calculate it separately in the query, whichever is faster:
positive_reviews * 1.0 / total_reviews = rating
SQL.
rating - POWER((rating - 0.5) * 2, -LOG10(total_reviews + 1))
Weighted SQL. Default weight is log(10, 2)
. POWER(2, weight)
gets very large, very fast so stay below 10.
rating - (rating - 0.5) * POWER(2, -LOG(total_reviews + 1, POWER(2, weight)))
Laplace smoothing where a
and b
are some numbers. Preferably so that a / b = 0.5
. Make sure to add a decimal point to either a
or b
to avoid integer division.
For your purposes, high values, perhaps in the high hundreds should be best. The bigger the numbers the more heavily it adjusts the rating toward 50%. This is countered by the amount of reviews a game has. So using a=1
and b=2
might work well on games with <100 ratings, but at 10K or 100K ratings, they have basically no effect on the score. On the other hand, a=500
, b=1000
would completely smooth out any game with a very low amount of ratings, ending it up with a rating of ~50%. However it would give more desirable results for games with 10K and 100K ratings.
(positive_reviews + a) / (total_reviews + b) AS weighted_rating
The fourth paragraph of my topmost comment here explains why I don't think this formula works for Steam though.
You could, however, try something like this instead:
SELECT (positive_reviews * 1.0 / total_reviews * LOG10(total_reviews + 1) + 0.5) / (LOG10(total_reviews + 1) + 1) AS weighted_rating
Dirichlet Prior Smoothing, where c
is a constant that is larger than 0. (I tried values around 1000, but you can also try much higher and lower values.)
SELECT (positive_reviews + c * p) / (total_reviews + c) AS weighted_rating
FROM review, (SELECT SUM(positive_reviews) * 1.0 / SUM(total_reviews) AS p FROM review)
This is actually very similar to Laplace smoothing. Basically c * p = a
and c = b
. The only difference is that a / b =/= 0.5
.
As such, it probably suffers from the same issues that I believe Laplace smoothing suffers from. I would again suggest taking a logarithm of the review counts beforehand:
(rating * LOG10(total_reviews + 1) + c * p) / (LOG10(total_reviews + 1) + c)
Because we took a base 10 logarithm, c
should be similarly smaller. c = 3
should be equivalent to c = 1000
in the previous formula, similarly c = 2
corresponds to c = 100
, and so on.
P.S. Since you seem to be looking for a formula that only showcases the TOP 250 games, some formulas that would normally not be ideal for ranking might suit you much better. This is because the formula doesn't have to give good results for games with:
@rubenpieters What is the function of the CASE
statement here? Is it just to ensure the weighted_score
does not exceed 1? If so, could we perhaps eliminate the CASE
statement by simply using MIN()
to pick either the computed value or 1 (whichever is smaller)?
It's to make sure that total_reviews * alpha / agg.max_votes
does not exceed 1. You could use Min
if you want.
@rubenpieters I don't think the weightings are working correctly. If I remove the CASE
statement, the results are exactly the same as without the weightings: the only difference is the numbers are bigger (and some of them exceed 1). The only reason the results set is different is because the CASE
statement is conditionally sending the top 5 results via a different weighting system.
This doesn't seem to be the correct behaviour at all: shouldn't the weighting be giving different priority to vote volumes instead? It seems all it did was scale up the scores a little bit; apart from that the list looks identical.
What do you mean, 'if you remove the case statement'? Can you post what query you are using?
If you pick a very high alpha
such as 5000.0
:
SELECT
app_name,
total_reviews,
positive_reviews,
negative_reviews,
CASE
WHEN (total_reviews * 5000.0 / agg.max_votes) > 1
THEN (positive_reviews*1.0 / total_reviews)
ELSE (total_reviews*5000.0 / agg.max_votes) * (positive_reviews*1.0 / total_reviews) + (1 - (total_reviews*5000.0 / agg.max_votes)) * agg.avg_score
END weighted_score
FROM
review,
(
SELECT
AVG(positive_reviews*1.0 / total_reviews) AS avg_score,
MAX(total_reviews) AS max_votes
FROM review
) agg
ORDER BY weighted_score DESC
LIMIT 25;
Then the the results are all items with low total amount of votes, but a high score (percentage of positieve reviews).
SELECT
app_name,
total_reviews,
positive_reviews,
negative_reviews,
(total_reviews*15.0 / agg.max_votes) * (positive_reviews*1.0 / total_reviews) + (1 - (total_reviews*15.0 / agg.max_votes)) * agg.avg_score AS weighted_score
FROM
review,
(
SELECT
AVG(positive_reviews*1.0 / total_reviews) AS avg_score,
MAX(total_reviews) AS max_votes
FROM review
) agg
ORDER BY weighted_score DESC
LIMIT 250;
This "weighted" query yields identical results to the unweighted query. The only difference is the scores are scaled up, but the apps selected and the order are exactly the same. This is still true if I use 5000
instead of 15
. Ergo, this is not a valid way to apply weight.
Yes because if you remove the case statement you are just computing a variant of the n / n_max
, but the 'weight' is not in [0..1]. The CASE
statement is an essential component in how the weighting works. It creates a threshold vote count for which the weight becomes 1, meaning it takes the actual score as its weighted score. For items below 1 weight it takes the average score of all games into account according to the weight.
And the problem with only using n / n_max
is that it favors popular items way too much.
Oh... haha 😅 Sorry I guess you're right. I tried the 5000
weight with the CASE
statement and the list does look a lot more like the Wilson list now. Please excuse me, I do not understand mathemetics very well. I just wanted to make sure you hadn't made a mistake but it seems you know what you're doing 👍
Good news: I have uploaded the algorithm test site to http://algorithms.steam250.com/. Let me know what you think! Sorry it took so long! The site was not designed to handle multiple algorithms so I had to do some extra programming. It shouldn't take too long to add the missing algorithms contributed by Torn tomorrow.
@rubenpieters All the algorithms are now available on algorithms.steam250.com. I'm pretty happy with the sampling of Wilson and Bayesian but I still need to mess around with the others. Let me know which one you think produces the best list!
Well the 'best' list is pretty subjective. I think something that doesn't favour something being popular or niche too much should be a decent default choice.
Currently games with comparatively low review volumes but very high approval rates are favoured amongst games with many more votes (in excess of 10x or more) with a comparable approval rating. Favour should be afforded to games that have garnered more attention from gamers.
This requires tweaking the algorithm but I'm currently unsure of the best way to do this. The current implementation features a number of constants that probably need to be tweaked to achieve the desired balance.
Following is a list of alternative algorithms we can try.
All algorithms with various weightings are now available to preview on algorithms.steam250.com. We need to come up with the smallest possible list of options that adequately demonstrates the available algorithm and weighting combinations. Too many options will overwhelm voters so each one must bring something unique to the discussion.