Anarios / return-youtube-dislike

Chrome extension to return youtube dislikes
https://returnyoutubedislike.com/
GNU General Public License v3.0
12.69k stars 568 forks source link

Information regarding invalid data from the extension #1066

Closed hostedbyjustus closed 3 months ago

hostedbyjustus commented 3 months ago

I'm not sure if this is the correct way to ask but surrounding the controversy of the YouTuber MrBeast screenshots of overly high dislikes on his newest video were popping up.

From the other side, official numbers inside the dashboard were apparently shown.

My question now is, whether there's any bot activity surrounding the extension like previously and if the number for that video is actually over exaggerated. There should be abnormalities recognisable (RYD users being counted in to heavily).

If this is the wrong place to ask, I'm sorry though I'm generally curious if this is a mistake on the data end, although I am aware of inflated numbers in the past.

video source: https://youtu.be/UPrkC1LdlLY?si=uH7Utz7ytiHKc3Mi

numbers from the MrBeast team: https://twitter.com/dramaalert/status/1819816544882270490?s=46

image

Sam101011 commented 3 months ago

I have a related question that may seem very stupid, but I just want to be sure: Can youtube delete dislikes on their end?

Cuz up to this point I had the impression the extension had an avarage accuracy of 80-90% and seeing such an enormous discrepancy all of a sudden definitely raises an eyebrow.

jpa102 commented 3 months ago

off topic but how did you get the dislike counts looking like that?

mine is looking like this (likes and dislike counts not separated)

is there a revanced patch that places the dislikes counts inside the dislike button? (not from the official implementation from the revanced team though)

Screenshot_20240804_171115.png

Anarios commented 3 months ago

By now 87k users with the extension disliked the video, and only 56k liked it.

A very good portion of these users were created long time ago. If we want to believe that these dislikes are botted - someone would have to come up with this plan over a year ago, create tenth of thousands of fake accounts, keep realistic-looking activity on these accounts for this year, pretending to be real people, and only today they would attack MrBeast

Regarding the figure in the screenshot - I simply do not see it possible for the video to be performing better than channel average despite all the controversy.

It could be that YouTube is not updating dislike counts in YT studio for some reason.

An inaccuracy is definitely possible due to how controversial this video is. After all - the extension serves 20 million people on a budget of 500$ - it will never be as accurate as Google. I can imagine the estimate being wrong by 2-3x, in the dislike count - simply because how different an average member of MrBeast audience is from an average dislike extension user. But the numbers from the screenshot are simply impossible.

Edit: As of august fifth there are 130 217 dislikes and 69 449 likes from the extension users

Edit2: August 6th - 145 317 dislikes and 75 865 likes

hostedbyjustus commented 3 months ago

Thanks for the insights. It's very interesting to see how the extension works and how the data is calculated.

Now the extension leads to situations like this where both available datas are unlikely to be correct but both sources work as proof. Manipulated data is out the question and the whole thing is just strange.

The data from RYD users is really interesting and seems to be a bit more reliable.

Because of all of this I would really like to see a creator submission feature with this extension someday. There won't be 100% accuracy but more accuracy in cases like this where the extension counts as a source would be needed. The worst decision by YouTube/Google was this and it shows.

Laitinlok commented 3 months ago

if you read the RYD doc, you'll see the the actual formula for dislikes and understand this is an approximation and not accurate, you cant demand accuracy from the extension and you cant blame it either...

anyway the formula is:

RYD Dislikes = (RYD users Likes / RYD users dislikes) * actual video Likes

also an approximation is better than nothing, but you should not blame the extension, it was never meant to be 100% accurate.

Doesn't the Central Limit Theorem states that a sample size of more than 30 is sufficiently large to estimate the distribution is close enough to normal distribution when the ratio of dislikes/(likes+dislikes) follows a Binomial distribution?

ParashRahman commented 3 months ago

if you read the RYD doc, you'll see the the actual formula for dislikes and understand this is an approximation and not accurate, you cant demand accuracy from the extension and you cant blame it either...

anyway the formula is:

RYD Dislikes = (RYD users Likes / RYD users dislikes) * actual video Likes

also an approximation is better than nothing, but you should not blame the extension, it was never meant to be 100% accurate.

Doesn't the Central Limit Theorem states that a sample size of more than 30 is sufficiently large to estimate the distribution is close enough to normal distribution when the ratio of likes/dislikes follows a Binomial distribution?

Well yea but if it's from the same distribution. This app needs to be downloaded and installed - it's a different audience who would do that. Many people dont care about this app. However, for that audience, it's a good estimate.

ParashRahman commented 3 months ago

if you read the RYD doc, you'll see the the actual formula for dislikes and understand this is an approximation and not accurate, you cant demand accuracy from the extension and you cant blame it either... anyway the formula is: RYD Dislikes = (RYD users Likes / RYD users dislikes) * actual video Likes also an approximation is better than nothing, but you should not blame the extension, it was never meant to be 100% accurate.

Think it should be:

RYD Dislikes = (RYD users dislikes / RYD users like) * actual video Likes

Anarios commented 3 months ago

The formula is also an oversimplification - there are weights applied, aimed at preventing bias (i.e. an attempt to counter the difference between the extension users and the users in general).

UnknownSheepTV commented 3 months ago

I don't know much about this stuff, but someone did the calculations and it seems to be off by ~4500%. This seems impossible to me when it almost always stays inside a ~40% difference from the Youtube Studio data.

ScrubN commented 3 months ago

The formula is also an oversimplification - there are weights applied, aimed at preventing bias (i.e. an attempt to counter the difference between the extension users and the users in general).

Would it be possible for a footnote to be added to the FAQ document explaining that the formula shown is simplified?

BaronAWC95014 commented 3 months ago

it's possible that the dislikes number is so different from the "official" numbers because of a biased sample.

the people who download this extension, compared to those who don't, are probably much more likely to care about dislikes being removed, more tech-savvy or more online (no shade), and/or more likely to dislike videos themselves.

a lot of mrbeast's audience is probably just 9-year olds that don't know about the controversy and possibly press the like button immediately, cuz wow it's mrbeast!!!!!!

meanwhile, the people who have bothered to install the RYD are more likely to have heard of the controversy and interact with the video accordingly.

i still refuse to believe the numbers can be this far off, but it may help to explain why the allegedly real dislikes are so much lower.

Anarios commented 3 months ago

it's possible that the dislikes number is so different from the "official" numbers because of a biased sample.

Of course, that's why, for the video in question, the extension-only votes show over 60% dislikes, but when the estimate is calculated, it translates to a 30% dislikes estimate. So, the number is decreased by almost twofold to adjust for the bias.

If a naive estimation was applied, the video in question would have an estimated 7 million dislikes.

jpa102 commented 3 months ago

i found a video where as the like counts increase, so does the estimated dislikes (they're 50 / 50 as of writing)

video link: https://www.youtube.com/watch?v=pBAIgQBZ08I

not sure, but i'm adding this here because i think this doesn't make any sense

the mrbeast one fluctuates frequently, i just saw it go down from 1,804,482 to 1,729,593 and now it's back at 1,8xx,xxx+ dislikes

Anarios commented 3 months ago

i found a video

yeah, this one only has 10 total votes from extension users so the score is highly speculative.

Not the case for MrBeast's video - where there are 200 000 votes by extension users recorded at the moment.

he mrbeast one fluctuates frequently,

yep, if several thousand people go and like it - the dislike estimate will decrease. I've also been adjusting the estimate weights, to combat over-estimation in cases of mass-disliked videos.

Borgling commented 3 months ago

Someone mentioned sample bias above. I feel like this might be a bigger difference than anticipated in this instance.

Reason is, when YouTube removed the dislike button, a lot of YouTube drama channels / YouTube news channels covered it, while telling people about this extension,

There could be a very large bias because people who watch YouTube drama/news, are the majority of the people who have the extension?

Olekoop commented 3 months ago

I'm a user of this extension since its beginning and I had no idea that it used a formula (algorithm or whatever) to display a number of dislikes. There's nothing on a official website about it, the formula is on the repo's FAQ, not on a website's FAQ.

In my opinion there should be a option to show a 'raw' number of dislikes. While this maybe is less accurate, the average user of this extension probably didn't know about the formula and overestimated the number of dislikes.

Anarios commented 3 months ago

@Olekoop it says right in the description that it's an estimate based on the votes of the users of the extension.

offsetcyan commented 3 months ago

@Olekoop that's the intent. It's meant to be used for misinformation, because drama attracts users.

SeanBannister commented 3 months ago

So on one side we have MrBeast's data showing 20,845 dislikes and on the other we have "Return YouTube Dislike" extension showing 87k. Lets presume the MrBeast data isn't faked and the "Return YouTube Dislike" extension isn't bots. I have two very speculative theories which could explain this.

If YouTube is able to detect the "Return YouTube Dislike" extension (similar to detecting adblockers) maybe it isn't counting dislikes from these users. At first this seems crazy why throw away these valid dislikes. But in YouTubes blog post about removing dislikes they make a case for why they would ignore dislikes from "Return YouTube Dislike" users:

"earlier this year, we experimented with the dislike button to see whether or not changes could help better protect our creators from harassment, and reduce dislike attacks... As part of this experiment, viewers could still see and use the dislike button. But because the count was not visible to them, we found that they were less likely to target a video’s dislike button to drive up the count."

My second theory. What if YT discovered that this wasn't enough to stop "dislike attacks" and creators were still seeing them in their analytics, causing them to be less likely to post more content. So... they started detecting when a videos dislikes deviated to far from the mean, presumed it was a dislike attack and wouldn't show it to the creator to "protect them". They would instead show the dislikes which they believed were not part of a dislike attack using some algorithm they dreamed up.

Totally speculative, but interesting to consider.

@Anarios if you're still considering allowing creators to submit their data in the future on the basis of it being more accurate.... maybe that isn't always the case?

Anarios commented 3 months ago

So... they started detecting when a videos dislikes deviated to far from the mean, presumed it was a dislike attack and wouldn't show it to the creator to "protect them".

I have similar thoughts.

@Anarios if you're still considering allowing creators to submit their data in the future on the basis of it being more accurate.... maybe that isn't always the case?

exactly, with the current MrBeast situation - it looks like even YT Studio stats are not really accurate anymore.

geerlingguy commented 3 months ago

I've explicitly asked YouTubeLiason about this, because it seems like one of the following is true:

  1. Chucky/MrBeast team is lying about MrBeast's analytics (unlikely, I don't think he doctored the video of YT studio mobile analytics)
  2. @Anarios is lying about r-y-d's analytics (also unlikely, what does he have to gain from that?)
  3. YouTube isn't counting all dislikes, whether by dropping dislikes as part of an anti-harassment feature, or holding dislikes until they are verified, as they do with public view counts (which usually lag the view counts creators see in the dashboard, especially early after a video's posted).

I don't know if YouTube will respond, however. For now, unless proven otherwise, I'm thinking it's #3.

watchofficial commented 3 months ago
  1. YouTube isn't counting all dislikes, whether by dropping dislikes as part of an anti-harassment feature, or holding dislikes until they are verified, as they do with public view counts (which usually lag the view counts creators see in the dashboard, especially early after a video's posted).

yes I believe the 3rd one is correct

Laitinlok commented 3 months ago

if you read the RYD doc, you'll see the the actual formula for dislikes and understand this is an approximation and not accurate, you cant demand accuracy from the extension and you cant blame it either...

anyway the formula is:

RYD Dislikes = (RYD users Likes / RYD users dislikes) * actual video Likes

also an approximation is better than nothing, but you should not blame the extension, it was never meant to be 100% accurate.

Doesn't the Central Limit Theorem states that a sample size of more than 30 is sufficiently large to estimate the distribution is close enough to normal distribution when the ratio of likes/dislikes follows a Binomial distribution?

Well yea but if it's from the same distribution. This app needs to be downloaded and installed - it's a different audience who would do that. Many people dont care about this app. However, for that audience, it's a good estimate.

The Bernoulli trials are independent meaning there is a probability p1 for someone hitting dislike for user A and another user B hitting dislike with probability p2 and user C hitting dislike with probability p3. The probability of a user hitting dislike is random and unknown. The mean of the probability of all those users are known and follows a Binomial distribution. A Binomial distribution with a very large sample can be estimated with a normal distribution. Using a 90% confidence level can estimate the lower bound and upper bound of the population mean of the probability. Using the lower bound and multiple the population (dislike + like) amount, by rearranging the variables, you can find the population dislike estimate. The de Moivre–Laplace theorem is the earliest form of CLT that states the normal distribution may be used as an approximation to the binomial distribution.

Olekoop commented 3 months ago

@Anarios I understand that it is a estimated number of dislikes, however I was under impression that it's going to be always underestimated (since not everyone has this extension), but with this formula there is a possibility that the estimate will be bigger than the actual number of dislikes.

The whole situation with MrBeast seems weird. I can understand why the formula is used but still I think that an option to show 'raw' number of dislike would be useful. I became aware of this problem after watching Asmongold's video (https://www.youtube.com/watch?v=jvJQ8mtuPVE) where he showed using his video statistics that extension both over and underestimates the number of dislikes.

I don't like the overestimates. I'd rather prefer perhaps less accurate underestimated numbers than perhaps more accurate over or underestimated numbers.

Laitinlok commented 3 months ago

if you read the RYD doc, you'll see the the actual formula for dislikes and understand this is an approximation and not accurate, you cant demand accuracy from the extension and you cant blame it either...

anyway the formula is:

RYD Dislikes = (RYD users Likes / RYD users dislikes) * actual video Likes

also an approximation is better than nothing, but you should not blame the extension, it was never meant to be 100% accurate.

Doesn't the Central Limit Theorem states that a sample size of more than 30 is sufficiently large to estimate the distribution is close enough to normal distribution when the ratio of likes/dislikes follows a Binomial distribution?

Well yea but if it's from the same distribution. This app needs to be downloaded and installed - it's a different audience who would do that. Many people dont care about this app. However, for that audience, it's a good estimate.

The CLT only states that if it is independent and identically distributed random variables would fit into the requirements of CLT. A random sample is one of those that fit into the category. As the developer said, the disliked users are accounts older than a year and active on YT confirms its independence, there is only a like and a dislike button so it's a Bernoulli trial.

Laitinlok commented 3 months ago

@Anarios I understand that it is a estimated number of dislikes, however I was under impression that it's going to be always underestimated (since not everyone has this extension), but with this formula there is a possibility that the estimate will be bigger than the actual number of dislikes.

The whole situation with MrBeast seems weird. I can understand why the formula is used but still I think that an option to show 'raw' number of dislike would be useful. I became aware of this problem after watching Asmongold's video (https://www.youtube.com/watch?v=jvJQ8mtuPVE) where he showed using his video statistics that extension both over and underestimates the number of dislikes.

I don't like the overestimates. I'd rather prefer perhaps less accurate underestimated numbers than perhaps more accurate over or underestimated numbers.

You can calculate the lower bound based of the sample mean using CLT with a 90% confidence level.

paaarker commented 3 months ago

The tweet by mrbeast team is likely fake. They'e in damage control mode

hostedbyjustus commented 3 months ago

Looks like I started a whole discussion about dislike data. But let's leave it here.