Closed noahtf13 closed 2 years ago
I agree with this. Posts with tons of upvotes tend to stick for way too long on Trending, and they make the Trending tab bland and uninteresting.
I've been giving this a lot of thought recently. It's a pretty interesting problem with the following constraints:
The reason the algorithm has things stick to the top like this is the scale of the difference in the upvotes, which completely overwhelms the gravity
attribute. gravity
is great at keeping content fresh, assuming a decent distribution of upvotes. Unfortunately, as noted, if something blows up on HN it completely overwhelms that gravity
as it has orders of magnitude more upvotes than the other content.
Here's one proposed solution: Have the Trending tab show only posts from the past 28 days arranged by number of upvotes, then randomly intersperse (say, every 3-6 posts) something that only has 1-5 upvotes. This checks the following boxes:
Let me know your thoughts and suggestions
@noahtf13 I like the reddit-like algorithm, but there are currently no downvotes, and we unfortunately can't use "views" as downvotes as it would require some chunky servers to parse the "upvote-rate". Food for thought though.
I've given this some more thought, and I think I have a solution:
Modifying the existing algorithm (which works really well assuming there aren't huge outliers) so that the number of upvotes has a logarithmic decay. Essentially the first 10 upvotes has the same value as the next 100, which has the same value as the next 1000.
This means that something with 1000 upvotes isn't too much of an outlier and will behave correctly in the existing algorithm.
@noahtf13 so you were right, just logging the upvotes is a good way to handle this, it just took me a while (and a good chunk of unusable code) to agree 😅 . I've pushed an update to test on production a bit before it goes live. You can see it here https://bearblog.dev/discover/?test=true&gravity=1.1 (you can also adjust the gravity in the url parameter which I plan to make a feature for people who want to play around with it).
Really appreciate the thought you put into this and other parts of the site, it shows!
I've just released a new algorithm for the discovery feed. I decided to switch is up and use a reddit-like time since Jan 1st 2020 to published date
instead of the HN like time since published
. This allows me to compute a score for each post on upvote as opposed to computing a score for all articles on each discover
page load. The has a similar effect while being more computationally friendly.
Score = log10(U) + (S / D * 8600)
Where,
U = Upvotes (toasts) of a post
S = Seconds since Jan 1st, 2020
D = Days modifier (currently at 7)
D values is used to specify that content D days old needs to have 10 times as many upvotes as something published now in order to outrank it.
I'm going to do a longer writeup on all the stuff I've learnt about ranking algorithms over the past week, so subscribe to my blog if you're interested to read it.
Give it a spin and let me know what you think. If you still prefer the old algorithm it's available at https://bearblog.dev/discover?old=true
Amazing! And already read through RSS! 👍
See https://miro.medium.com/max/800/0*21Ezm5SbYie_a3oD.png
Even just a log of votes as the numerator of the current algorithm you list would be helpful. Posts that blow up on the discover feed tend to "stick" for a week or more, but maybe that's the intent?