Give more weighting to vote volumes

Bilge commented 6 years ago

Currently games with comparatively low review volumes but very high approval rates are favoured amongst games with many more votes (in excess of 10x or more) with a comparable approval rating. Favour should be afforded to games that have garnered more attention from gamers.

This requires tweaking the algorithm but I'm currently unsure of the best way to do this. The current implementation features a number of constants that probably need to be tweaked to achieve the desired balance.

Following is a list of alternative algorithms we can try.

[x] Wilson score (current implementation)
[x] Bayesian average
[x] Laplace smoothing
[x] Dirichlet Prior Smoothing
[x] Torn's algorithm

All algorithms with various weightings are now available to preview on algorithms.steam250.com. We need to come up with the smallest possible list of options that adequately demonstrates the available algorithm and weighting combinations. Too many options will overwhelm voters so each one must bring something unique to the discussion.

rubenpieters commented 6 years ago

This is an SQL query executing the algorithm in this post, with the n / n_max weighting. It could be interesting to see the different results of different algorithms and weights.

SELECT
  app_name,
  total_reviews,
  positive_reviews,
  negative_reviews,
  (total_reviews*1.0 / votes.max_votes) * (positive_reviews*1.0 / total_reviews) + (1 - (total_reviews*1.0 / votes.max_votes)) * scores.avg_score AS weighted_score
FROM
  review,
  (
    SELECT AVG(score) AS avg_score
    FROM
    (
      SELECT
        positive_reviews*1.0 / total_reviews AS score
      FROM
        review
    ) scores
  ) scores,
  (
    SELECT MAX(vote) AS max_votes
    FROM
    (
      SELECT
        total_reviews AS vote
      FROM
        review
    ) votes
  ) votes
ORDER BY weighted_score DESC
LIMIT 250;

Bilge commented 6 years ago

Hey @rubenpieters that's super helpful! Let me see if I can get a clone of the site up running this algorithm so we can discuss the differences.

I'm not sure if it's necessary to completely changes the algorithm, I was thinking we could just adjust the weights of the current algorithm, but it will be interesting to see the difference anyway. Who knows, maybe this turns out to be exactly what we need.

Bilge commented 6 years ago

Here's the new top 20 I'm seeing.

Title	Reviews	Positive	Negative	Score
Counter-Strike: Global Offensive	2215644	1999455	216189	0.90242611177608
Dota 2	808378	711271	97107	0.80254662214764
Team Fortress 2	473338	444344	28994	0.796713524112741
Garry's Mod	307098	293864	13234	0.785678837527148
Unturned	290821	265235	25586	0.778327041008859
Terraria	197138	191295	5843	0.777010777597263
The Elder Scrolls V: Skyrim	233903	219160	14743	0.777007371870988
Left 4 Dead 2	195686	189509	6177	0.776701522441416
Rocket League	175045	162039	13006	0.771366054821586
The Witcher® 3: Wild Hunt	114765	111890	2875	0.769358018594371
Euro Truck Simulator 2	112551	109303	3248	0.768947976901211
Portal 2	104207	102849	1358	0.768890121584016
Borderlands 2	113920	110152	3768	0.768862730303245
Sid Meier's Civilization® V	114591	110504	4087	0.768792004437681
Warframe	148314	135122	13192	0.768363992091475
PAYDAY 2	318979	263537	55442	0.767925846045087
Counter-Strike	97991	95813	2178	0.76784145066419
Life is Strange - Episode 1	98111	94660	3451	0.767279999789437
Undertale	85355	81684	3671	0.765788185375327
Paladins	139548	120403	19145	0.764720240155128

This list definitely seems to favour more classics. That's probably a good thing.

Bilge commented 6 years ago

It seems we can simplify:

(
  SELECT MAX(vote) AS max_votes
  FROM
  (
    SELECT
      total_reviews AS vote
    FROM
      review
  ) votes
) votes

down to:

(
    SELECT MAX(total_reviews) AS max_votes FROM review
) votes

...because the original statement fragment is effectively just renaming a column.

I wonder if there are any more optimizations we can make? I'm not very familiar with SQL.

rubenpieters commented 6 years ago

Indeed, it's also possible to collapse both the max and avg into one subquery:

SELECT
  app_name,
  total_reviews,
  positive_reviews,
  negative_reviews,
  (total_reviews*1.0 / agg.max_votes) * (positive_reviews*1.0 / total_reviews) + (1 - (total_reviews*1.0 / agg.max_votes)) * agg.avg_score AS weighted_score
FROM
  review,
  (
    SELECT
     AVG(positive_reviews*1.0 / total_reviews) AS avg_score,
     MAX(total_reviews) AS max_votes
    FROM review
  ) agg
ORDER BY weighted_score DESC
LIMIT 250;

Bilge commented 6 years ago

Is it possible to move the embedded select statement into the SELECT portion of the root query, instead of having it in the FROM portion? This would make it a bit easier to split the query up in the code. Maybe I'm just being stupid 😆

rubenpieters commented 6 years ago

I don't think it is possible to do that, since you cannot use an alias created in a SELECT clause. Also it might execute the subquery for every item in the row, I'm not sure if that would be optimized away.

Anyway, here is a tweakable version of my earlier query:

SELECT
  app_name,
  total_reviews,
  positive_reviews,
  negative_reviews,
  CASE 
    WHEN (total_reviews * 15.0 / agg.max_votes) > 1
      THEN (positive_reviews*1.0 / total_reviews)
    ELSE (total_reviews*15.0 / agg.max_votes) * (positive_reviews*1.0 / total_reviews) + (1 - (total_reviews*15.0 / agg.max_votes)) * agg.avg_score
  END weighted_score
FROM
  review,
  (
    SELECT
     AVG(positive_reviews*1.0 / total_reviews) AS avg_score,
     MAX(total_reviews) AS max_votes
    FROM review
  ) agg
ORDER BY weighted_score DESC
LIMIT 25;

If you place 1.0 instead of the 15.0s you get the old version, and you can increase the parameter to make total amount of votes matter less.

Bilge commented 6 years ago

For Bayesian 15 here's the top 20 I'm seeing.

Title	Reviews	Positive	Negative	Score
Terraria	197138	191295	5843	0.970360863963315
Left 4 Dead 2	195686	189509	6177	0.968434124055885
Garry's Mod	307098	293864	13234	0.956906264449785
Team Fortress 2	473338	444344	28994	0.938745674338422
The Elder Scrolls V: Skyrim	233903	219160	14743	0.936969598508784
The Witcher® 3: Wild Hunt	114765	111890	2875	0.926589719203611
Rocket League	175045	162039	13006	0.925699105944186
Euro Truck Simulator 2	112551	109303	3248	0.92043909380621
Portal 2	104207	102849	1358	0.919571264048282
Borderlands 2	113920	110152	3768	0.919160394836724
Sid Meier's Civilization® V	114591	110504	4087	0.918099506853252
Unturned	290821	265235	25586	0.912021484005625
Warframe	148314	135122	13192	0.911053575522203
Counter-Strike	97991	95813	2178	0.903841200250886
Counter-Strike: Global Offensive	2215644	1999455	216189	0.90242611177608
Life is Strange - Episode 1	98111	94660	3451	0.895419437129596
Dota 2	808378	711271	97107	0.879874266741549
Undertale	85355	81684	3671	0.873042220917944
Paladins	139548	120403	19145	0.857023042614962
Stardew Valley	68186	66250	1936	0.85667419610521

Interestingly, the top 20 still mostly comprises similar titles to the Bayesian 1 list, just in a slightly different order. It seems the characteristics of this algorithm are very different from the Wilson.

I would be very interested, if you are able, to try tweaking Wilson's algorithm in a similar manner to see how it changes.

rubenpieters commented 6 years ago

If you try with higher numbers than 15 (try 25, 50, 100, ..) then you start seeing more and more niche titles in the top list.

rubenpieters commented 6 years ago

The other scores are more difficult to tweak and I would have to read on the underlying math to understand it more. I'll have to see if I can free some time to do that.

Bilge commented 6 years ago

I'm working on producing a mini site to showcase all the different algorithms and weights based on the same data set. When we have finished we'll let people vote on which algorithm (and weight) they think makes the best top list 👍

Bilge commented 6 years ago

I've received some very interesting notes from a friend about the remaining three algorithms. Hopefully I've interpreted them correctly.

Custom formula

Either replace rating with this, or calculate it separately in the query, whichever is faster:

positive_reviews * 1.0 / total_reviews = rating

SQL.

rating - POWER((rating - 0.5) * 2, -LOG10(total_reviews + 1))

Weighted SQL. Default weight is log(10, 2). POWER(2, weight) gets very large, very fast so stay below 10.

 rating - (rating - 0.5) * POWER(2, -LOG(total_reviews + 1, POWER(2, weight)))

Laplace smoothing

Laplace smoothing where a and b are some numbers. Preferably so that a / b = 0.5. Make sure to add a decimal point to either a or b to avoid integer division.

For your purposes, high values, perhaps in the high hundreds should be best. The bigger the numbers the more heavily it adjusts the rating toward 50%. This is countered by the amount of reviews a game has. So using a=1 and b=2 might work well on games with <100 ratings, but at 10K or 100K ratings, they have basically no effect on the score. On the other hand, a=500, b=1000 would completely smooth out any game with a very low amount of ratings, ending it up with a rating of ~50%. However it would give more desirable results for games with 10K and 100K ratings.

(positive_reviews + a) / (total_reviews + b) AS weighted_rating

The fourth paragraph of my topmost comment here explains why I don't think this formula works for Steam though.

You could, however, try something like this instead:

SELECT (positive_reviews * 1.0 / total_reviews * LOG10(total_reviews + 1) + 0.5) / (LOG10(total_reviews + 1) + 1) AS weighted_rating

Dirichlet Prior Smoothing

Dirichlet Prior Smoothing, where c is a constant that is larger than 0. (I tried values around 1000, but you can also try much higher and lower values.)

SELECT (positive_reviews + c * p) / (total_reviews + c) AS weighted_rating
FROM review, (SELECT SUM(positive_reviews) * 1.0 / SUM(total_reviews) AS p FROM review)

This is actually very similar to Laplace smoothing. Basically c * p = a and c = b. The only difference is that a / b =/= 0.5.

As such, it probably suffers from the same issues that I believe Laplace smoothing suffers from. I would again suggest taking a logarithm of the review counts beforehand:

(rating * LOG10(total_reviews + 1) + c * p) / (LOG10(total_reviews + 1) + c)

Because we took a base 10 logarithm, c should be similarly smaller. c = 3 should be equivalent to c = 1000 in the previous formula, similarly c = 2 corresponds to c = 100, and so on.

P.S. Since you seem to be looking for a formula that only showcases the TOP 250 games, some formulas that would normally not be ideal for ranking might suit you much better. This is because the formula doesn't have to give good results for games with:

A low amount of ratings.
A low rating.

Bilge commented 6 years ago

@rubenpieters What is the function of the CASE statement here? Is it just to ensure the weighted_score does not exceed 1? If so, could we perhaps eliminate the CASE statement by simply using MIN() to pick either the computed value or 1 (whichever is smaller)?

rubenpieters commented 6 years ago

It's to make sure that total_reviews * alpha / agg.max_votes does not exceed 1. You could use Min if you want.

Bilge commented 6 years ago

@rubenpieters I don't think the weightings are working correctly. If I remove the CASE statement, the results are exactly the same as without the weightings: the only difference is the numbers are bigger (and some of them exceed 1). The only reason the results set is different is because the CASE statement is conditionally sending the top 5 results via a different weighting system.

This doesn't seem to be the correct behaviour at all: shouldn't the weighting be giving different priority to vote volumes instead? It seems all it did was scale up the scores a little bit; apart from that the list looks identical.

rubenpieters commented 6 years ago

What do you mean, 'if you remove the case statement'? Can you post what query you are using?

If you pick a very high alpha such as 5000.0:

SELECT
  app_name,
  total_reviews,
  positive_reviews,
  negative_reviews,
  CASE 
    WHEN (total_reviews * 5000.0 / agg.max_votes) > 1
      THEN (positive_reviews*1.0 / total_reviews)
    ELSE (total_reviews*5000.0 / agg.max_votes) * (positive_reviews*1.0 / total_reviews) + (1 - (total_reviews*5000.0 / agg.max_votes)) * agg.avg_score
  END weighted_score
FROM
  review,
  (
    SELECT
     AVG(positive_reviews*1.0 / total_reviews) AS avg_score,
     MAX(total_reviews) AS max_votes
    FROM review
  ) agg
ORDER BY weighted_score DESC
LIMIT 25;

Then the the results are all items with low total amount of votes, but a high score (percentage of positieve reviews).

Bilge commented 6 years ago

SELECT
  app_name,
  total_reviews,
  positive_reviews,
  negative_reviews,
  (total_reviews*15.0 / agg.max_votes) * (positive_reviews*1.0 / total_reviews) + (1 - (total_reviews*15.0 / agg.max_votes)) * agg.avg_score AS weighted_score
FROM
  review,
  (
    SELECT
     AVG(positive_reviews*1.0 / total_reviews) AS avg_score,
     MAX(total_reviews) AS max_votes
    FROM review
  ) agg
ORDER BY weighted_score DESC
LIMIT 250;

This "weighted" query yields identical results to the unweighted query. The only difference is the scores are scaled up, but the apps selected and the order are exactly the same. This is still true if I use 5000 instead of 15. Ergo, this is not a valid way to apply weight.

rubenpieters commented 6 years ago

Yes because if you remove the case statement you are just computing a variant of the n / n_max, but the 'weight' is not in [0..1]. The CASE statement is an essential component in how the weighting works. It creates a threshold vote count for which the weight becomes 1, meaning it takes the actual score as its weighted score. For items below 1 weight it takes the average score of all games into account according to the weight.

And the problem with only using n / n_max is that it favors popular items way too much.

Bilge commented 6 years ago

Oh... haha 😅 Sorry I guess you're right. I tried the 5000 weight with the CASE statement and the list does look a lot more like the Wilson list now. Please excuse me, I do not understand mathemetics very well. I just wanted to make sure you hadn't made a mistake but it seems you know what you're doing 👍

Bilge commented 6 years ago

Good news: I have uploaded the algorithm test site to http://algorithms.steam250.com/. Let me know what you think! Sorry it took so long! The site was not designed to handle multiple algorithms so I had to do some extra programming. It shouldn't take too long to add the missing algorithms contributed by Torn tomorrow.

Bilge commented 6 years ago

@rubenpieters All the algorithms are now available on algorithms.steam250.com. I'm pretty happy with the sampling of Wilson and Bayesian but I still need to mess around with the others. Let me know which one you think produces the best list!

rubenpieters commented 6 years ago

Well the 'best' list is pretty subjective. I think something that doesn't favour something being popular or niche too much should be a decent default choice.

250 / Steam-250