Discussion: Algorithm - Githubissues

Greedquest commented 2 years ago

As I understand it, right now this extension generates the dislike count on a video like this:

Retrieve cached dislike ratio from last time that video was scraped. Linearly extrapolate dislikes from current likes.
Actual dislikes may continue to be retrieved for certain creators' videos if they opt-in to share stats from creator studio.

If not, and the video is not in cache (e.g. it's new) then dislikes = f( likes / views ) where f() is some scaling function tuned according to a selection of known videos.

In future, users of this extension or others will be able to vote in a community maintained database

Apparently the extension uses community like: dislike ratio from its users and scales this up to match the number of total likes a video has (see https://github.com/Anarios/return-youtube-dislike/issues/330#issuecomment-995838302)

I see a few potential limitations with each of these sources of data:

Linearly extrapolating ignores changes to voting over time - e.g. if an old tutorial with good ratings has since been shown to introduce a security vulnerability, it may start receiving dislikes but this would not be reflected if just extrapolating historic values.
For current content creators sharing stats; this assumes voting behaviour won't change after totals are hidden - I think if dislikes aren't shown then a video will probably receive fewer of them, so maybe trusting the raw stats won't match the sentiment as well as it used to, I'm not sure. Certainly disliking to inform others will be less effective and maybe therefore done less.
The likes:views ratio is more a measure of engagement than sentiment - over in the long discussion here: https://github.com/elliotwaite/thumbnail-rating-bar-for-youtube/issues/50 we found something like a music video where people play it in the background has very little engagement and would appear "bad", whereas some viral but low quality cooking video has much higher engagement (likes:views) even if likes:dislikes were pretty poor. It seems like max( f(likes/views), g(likes/comments) ) helps balance out the engagement factor.
A community maintained database suffers from small sample size and bias (do you trust the opinions of users of this extension more than anyone else? Will they be representative of a typical youtube viewer?)

I think therefore it would be good to have some insight about how the backend works in theory and a place for more suggestions and discussion. Some statistical analysis of the historic dataset to find correlations between many factors, not just likes and views, would probably help identify a good heuristic for predicting dislike counts. If there are many complex edge cases then this may require a machine learning approach on data like video length, content, channel - all of which may predict dislikes.m Left field options like comment sentiment analysis have also been voiced, but simpler the better IMO.

martin-braun commented 2 years ago

Like to view ratio is definitely not accurate. We did some A/B testing with the thumbnail rating extension prior 13th of Dec. My intentional idea would be this:

Imagine you have 1000 views on a video, 100 views were made by people who installed this extension. 10 people with this extension disliked the video, so the approximation is 10% of people disliked it, so it should display 100 dislikes ... or maybe not, please read further.

Right now the more time passes by, the less accurate is it to take existing dislikes into account, so I would value this lesser and lesser and start to judge videos more on the own database as proposed above. This should get more and more accurate when more people start to use this extension.

The biggest problem I see is that only people who value to give dislikes and care about dislikes will install this extension, so a good fraction of people who never interact or care or simply are too unexperienced to install extensions will not be taken into account. Those people who install this extension will be a narrow target audience (younger people, people with a little bit technical background). I also believe that men are more likely to install browser extensions than women, the list goes on.

It must not happen that when only one person visits a video with this extension can cause massive dislikes, just because 99.9% of the other people don't use this extension, it has to be more to it. If only 0.1% of the viewers using this extension and they dislike, they might just not be the target audience, i.e. grandpa or grandma might like the video, but they are not experienced enough to install this extension or don't care.

This extension will never compensate, finding a good formula is the holy grail, and no formula will work for all videos the same. Rating will never be accurate enough to make a true approximation, but we can at least express our dislikes to each other, which is better than nothing. I agree to get more insights on the current backend, this is important to work towards the best formula possible.

sy-b commented 2 years ago

Fresh News

discussion tab opened

@greedquest & @martin-braun So, this discussion can be taken in discussions section. (I suggest)

Anarios commented 2 years ago

Views are not used in dislike estimation at all - there is no useful correlation.
The "algorithm" to calculate actual dislike number is the following: calculate ratio from extension user votes (say, video has 10% dislikes). Aplly this ratio to tatal number of likes.

This is extremely vulnerable to bias from extension users (i.e. data will be wrong if extension user's opinion differs greatly from general opinion). But I don't see a way around it, and I don't see any alternative.

From what I see now (comparing archived dislikes to actual dislikes by extension users) - there is no strong bias, we're not that far from a "random sample" (excluding a few edge cases).

So, in the end - if there's no archived data - you won't see actual number of dislikes, you will see a number of dislikes that the video would have had if all the viewers shared opinion with extension users. Far from ideal - but I didn't come up with anything better.

I'm open to any improvement suggestions, though.

Anarios commented 2 years ago

The userbase is around 1.5 millions now, though, and grows - so, at least, the sample size is quite big. Sad part is that it's not random.

sserdda-liamE commented 2 years ago

The answer is pretty simple: Hide the dislikes until the user clicks dislike, then show that 7.5 billion other people disliked it. It's what the users really want to see -- that everybody else on the planet agrees with their hatred. So just give them what they want. This will make this the Fox News of extensions, and Fox News is quite popular.

phly95 commented 2 years ago

I think that we shouldn't bother with approximations. We should just show the number of likes and dislikes from those who have the extension installed and only fall back to algorithms if not enough people with the extensions have voted on a video.

Sure the sample may not be random, but it still serves the purpose of the like and dislike counter existing, which is avoiding scams, and judging the quality of the video before watching it.

If you hover over the counter, maybe it can scale the ratio over the number of public number of likes on the video, otherwise, it shows the number of votes from those with the extension only.

Greedquest commented 2 years ago

@Anarios Ok I didn't realise you were already collecting extension user data to populate the stats for new videos. For old videos, do you look up the dislike & like/view counts from the last snapshot in the archive and then linearly extrapolate to today's like/view count to estimate current dislikes. I.e. the dislike ratio in the archive is applied and scaled as more views and likes come in.

Do you have a decent sized dataset you could share where I could run some regression analysis on it to find if there are any other good predictors of dislikes. For example comment count anecdotally seems to encode interesting data about how "engaging" a video is - i.e. how likely someone is to interact with it - positive or negative. This lets you make a likes:comments statistic which correlates better with likes:dislikes than likes:views does. A dataset with archived youtube api metadata as well as ideally extension statistics would help determine how to mix these different factors into a single prediction algorithm - extension data may correlate with archive data in some videos but not others and evaluating that seems sensible.

Greedquest commented 2 years ago

@phly95 "[...] only fall back to algorithms if not enough people with the extensions have voted on a video." - That's part of the aim of this discussion - finding the threshold number at which extension votes are considered "enough" and their opinion is consistent and useful. Or maybe it's not a binary thing, and user votes could have 90% weighting, some other data has 10% weighting to give a final statistic that is more robust and reliable.

Interestingly I think the bias in the userbase will be less of an issue than you might think. Assuming users of this extension are fairly like-minded, then they probably watch similar youtube videos. Therefore on the small subset of videos that get a significant number of votes from users of this extension, those same users are probably pretty representative of a typical viewer of that video anyway, so their opinion will generally match the ground truth and be a reliable guestimate for the true hidden dislike value.

martin-braun commented 2 years ago

@Anarios So does this mean that the extension also tracks my likes so it can calculate a ratio and convert the dislikes to match the likes, respectively? I think so, please confirm.

Regarding bias, there are two more indicators that we could take into account, but I don't say we should:

The fraction of unique views to the overall community of YouTube (~3b active users) vs. the extension community (~1.5m active users). I.e. does it mean anything to a possible approximation if twice or half as many people of this community (relatively speaking) decided to watch the video?
The engagement of likes (fraction of users who clicked like to unique views on YouTubes overall community vs. extension community). I.e. does it mean anything to a possible approximation if twice or half as many people of this community (relatively speaking) engaged in liking the video?

Both could be valued to examine a positive or negative bias for likes or dislikes respectively. There is also the opposite case to the above examples where a hated video was mainly clicked by the extension community to give dislikes, but the overall community of YT wouldn't give so many dislikes, they still don't care disliking so much, since this extension was not installed by them.

So the question is, should we value dislikes differently in such cases? I thought about it quite some time, but it's really hard to be sure, so I would love if you could engage in my scenarios, think them through and perhaps give response.

sserdda-liamE commented 2 years ago

their opinion will generally match the ground truth and be a reliable guestimate for the true hidden dislike value.

Except the extension will be training them to dislike due to the scaling. People like for their opinion to "move the meter", which is why votes on 0-10 scales invariably cluster at either 0 or 10. If you're allowing one extension user's vote to move the dislike count by 100,000 when it only moves the like meter by 1, guess what's going to happen? They're going to look for the slightest reason to be triggered so they can smash that dislike button and "prove" that opinion right. You can't assume an outsized vote will track like a regular one when you've massively amplified the masturbatory reward of dislike brigading.

martin-braun commented 2 years ago

@Greedquest I agree on @sserdda-liamE's point as well. This extension community shall have a high relative fraction of people who want to dislike so badly, because people that would dislike only reasonably might just accept the fact that they can't, now, so not trying to find this extension. There is a bias that could lead to unjustified levering of dislikes. That's why I thought the fraction of views or engagement might can help to adjust the value of dislikes in a reasonable matter before upscaling in regards of the official likes.

However, I disagree on your statement to show only dislikes when disliked yourself, @sserdda-liamE. The point of this extension is to know how good the video is before watching the video.

phly95 commented 2 years ago

their opinion will generally match the ground truth and be a reliable guestimate for the true hidden dislike value.

Except the extension will be training them to dislike due to the scaling. People like for their opinion to "move the meter", which is why votes on 0-10 scales invariably cluster at either 0 or 10. If you're allowing one extension user's vote to move the dislike count by 100,000 when it only moves the like meter by 1, guess what's going to happen? They're going to look for the slightest reason to be triggered so they can smash that dislike button and "prove" that opinion right. You can't assume an outsized vote will track like a regular one when you've massively amplified the masturbatory reward of dislike brigading.

I think if likes and dislikes only from extension users are shown, there won't be a need to make compensatory dislikes. And for people disliking because they can, the novelty will wear off and people will begin rating things based on their honest opinion.

martin-braun commented 2 years ago

@phly95 Honestly, I wouldn't be interested in this extension if I end up seeing only dislikes (and likes) from a narrow community. YT has ~3b active users (actually even more, ~42% of people living on this planet use YT at least once per month). How do you want to account for that with a ~1.5m user base that does the (negative) rating?

sserdda-liamE commented 2 years ago

Actually 1.5 million would be plenty for a pretty representative sample if it were in any way true and was limited to one vote per user. However, dislike brigades are heavily botted, and even though I believe there's some anti-bot provisions in Google's extension count, it's not going to be anywhere near as strong as what it is on Youtube, so my guess is the vast majority of those extension downloads were botted.
This could be checked by looking at how many unique IPs are currently reporting stats. You could only prove that there aren't that many users by a low count, since if a bot army is active across a million proxies that would also show a high count, but if he's only seeing 2000 unique IPs a day that would prove there aren't anywhere near enough for an accurate sample and also that it's heavily botted so any counts can't be trusted. (not that they could anyway if anyone can spin up 10k VMs and have all the dislikes counted.)

phly95 commented 2 years ago

@martin-braun Maybe the user can have a choice when installing the extension whether they want an estimated global count or just show the numbers for those with the extension installed only. Also, seeing Like/Dislike ratios in video thumbnails would be a nice feature as well.

Greedquest commented 2 years ago

@sserdda-liamE @martin-braun Referring back to an earlier comment:

From what I see now (comparing archived dislikes to actual dislikes by extension users) - there is no strong bias, we're not that far from a "random sample" (excluding a few edge cases).

IMO if new extension data is highly correlated with archived data then it is probably a good predictor. A regression analysis could quantify exactly how significant the correlation is. That's probably the way forward for data from extension users or metrics derived from other variables like comment count, view count, comment sentiment etc.

seizir commented 2 years ago

Regardless of what the algorithm ends up being, what I would like to see as an option is being able to see the raw data and then I can make my own judgement.

It should show both the last known public count and extension user votes without any extrapolation.

martin-braun commented 2 years ago

@phly95 Adding a setting to decide to use calculated or actual like/dislikes seems to be a good idea. Also there is already an extension for thumbnail rating preview that [uses this API](https://github.com/elliotwaite/thumbnail-rating-bar-for-youtube/issues/54, so no need to have thumbnail previews in this extension.

@seizir I think so as well, a debug mode that shows all information that this extension uses, including the like/dislike of YT from the last time dislikes could be fetched, so we know the ratio like it was before the dislikes have been removed.

Ember-ruby commented 2 years ago

i think that showing algorithmically generated dislikes on the numbers that you see when you open a video is fine, but I'd like to see the raw data if i hover over the dislike bar

PreciousWarrior commented 2 years ago

Linearly extrapolating ignores changes to voting over time - e.g. if an old tutorial with good ratings has since been shown to introduce a security vulnerability, it may start receiving dislikes but this would not be reflected if just extrapolating historic values.

If I understand correctly the extension is using linear extrapolation for videos whose L:D ratio has already been recorded, and if the L:D ratio is not stored (new video) it is relying on the ratings provided by the extension users.

In my opinion once the extension matures further, the historic dataset can be entirely dropped, and the extension can just depend on user submitted data entirely (and the known likes) to calculate the dislikes for a video. This fixes the issue to the extent that the issue was fixed before YouTube removed the dislike counter, people can see the total like/dislike ratio (accounting for recent trends, but ALSO accounting for ratings before the change in general trend / L:D ratio).

However, in my opinion, this extension can add a toggle able feature that YouTube never had, i.e. calculating the recent trend of likes/dislikes, which will be more relevant to most people. Take the example of a Log4j tutorial. Lets say it had a L:D ratio of 100:1 before the vulnerability was discovered, afterwards, only a few people have viewed it and changed the L:D ratio to 100:4. However, in general the trend has massively changed, but, because of the inaccurate historic ratings, the final ratings don't really reflect what an average developer would feel about the tutorial. Lets say if you currently surveyed a 100 developers, 50 would have disliked it (which isn't really reflected by the L:D ratio of 100:4, because of few new views).

I propose a solution for this, that the database not only stores the total likes and dislikes ever submitted by the users of the extension, but also the likes and dislikes submitted by the users of the extension in the past n days. I have no clue how the value of n would be calculated, that really is out of my scope (the result needs to be statistically significant and have a low chance to contain outliers).

donRumata03 commented 2 years ago

I suggest a novel approach: just use the existing data to learn a giant NN based only on the video contents and sample dislike values from it for all the videos :) — a solution that filters out «bad» videos without the need for humans to watch it…

More to the point, it sound like a pretty straightforward approach to have these two steps:

Once per analyzed video conduct sentement analysis of comments (of course, start from those in the top if there are not enough computational resources and process more comments when there are extension users watching the video). This step doesn't require a specialized model — just use one of the ready solutions.
After that make a pretty small model for the actual dislikes number predicting. As the parameters it would accept the distrubution of sentiment vectors fetched from the underlying model described above, extension user views, likes and dislikes, and some other publicly available parameters such as total like amound, views and etc. Most of the parameters probably need to be logariphmated before being sent into the model (and the predicted value is log(dislikes)). The numer of input parameters is pretty low and so is the model size (compared to the amount of training data), so it shouldn't be diffucult to fight with overfitting.

@Anarios, is there a high probability that we'll get something like this in the future releases?

donRumata03 commented 2 years ago

I've seen this try: the author said that his data is unrepresentable. And that's true. But the good thing is that plugin is able to fetch most popular but not random comments. Moreover, there are the dislikes from plugin users.

https://towardsdatascience.com/predicting-the-number-of-dislikes-on-youtube-videos-part-2-model-aa981a69a8b2

Anarios / return-youtube-dislike

Discussion: Algorithm #330