elliotwaite / thumbnail-rating-bar-for-youtube

A Chrome and Firefox extension for YouTube that adds a rating bar (likes/dislikes ratio) to the bottom of every thumbnail.
https://chrome.google.com/webstore/detail/youtube-thumbnail-rating/cmlddjbnoehmihdmfhaacemlpgfbpoeb
MIT License
249 stars 17 forks source link

Local storage of statistics for later reuse #51

Closed marioeljuga closed 2 years ago

marioeljuga commented 2 years ago

Would it be possible to permanently store all fetched video statistics to local storage? So once Youtube hides this data we could still use what we saved locally. At some point, we could even share among each other our locally fetched statistics so that we capture as much data as possible before it disappears.

I am primarily interested in "saving" likes/dislikes about tech videos/tutorials and I am sure many others here have similar interests. So we could really benefit from preserving statistics of this kind of videos.

Thank you for this amazing extension!

Greedquest commented 2 years ago

This is a good idea. I think for 2 reasons: 1) A local cache will allow the extension to look up info locally and supplement results once the dislikes become invisible. 2) The caches can be shared:

An alternative to caching results of day-to-day usage would be to bundle an actual web crawler (e.g. https://github.com/RLee12/Youtube-Crawler) - this way you can max out users' free api allocation to download as much data as possible before mid December. Either way, I think ideally results from this crawling or caching would be uploaded to a central repository automatically rather than relying on users to share their cache files manually, although I can see this would require additional permissions for the extension so might be a headache. Still I think users of this extension would be more willing than most to share this data.

Perhaps there exists a dataset of youtube video statistics out there already for researchers?

elliotwaite commented 2 years ago

If this were done, I think storing the statistics in a central database might be the way to go. And then maybe another table in the database could track the actual likes/dislikes of the users of this extension so that after the public dislikes are removed, this user data could still be used on newer videos that don't have saved statistics, although the sample size would be much smaller. I'll have to look into the potential additional permissions that would be required to add this feature.

marioeljuga commented 2 years ago

As @Greedquest hinted, there are already databases that have a significant amount of data. So looks like there is no need to capture the data again.

A comment from this redditor is a good starting point for the research:

https://www.reddit.com/r/NoStupidQuestions/comments/qt85ga/comment/hkhvfq6/?utm_source=share&utm_medium=web2x&context=3

Greedquest commented 2 years ago

@marioeljuga Thanks, that dataset (https://archive.org/details/Youtube_metadata_02_2019 in case the reddit link dies) contains info on like & dislike count for a couple billion videos (~1-10% of YouTube depending on which estimate you use) so would be good for supplying data to this extension for historic videos. Rather than downloading that massive dataset, you can also search the site by video ID. Perhaps the owner of that site would be willing to expose an API for this extension to use?

Annoyingly, in terms of the efforts to keep this extension running on new videos using a tailored algorithm; that dataset does not contain comment count for the Like : Comment ratio or any info on comments for sentiment analysis, so work on gathering that data to train a new tool still needs to happen separately I think.

deefdragon commented 2 years ago

There are also other extensions (this appears to be the biggest one (it's github)) which have begun saving dislikes. I think it should be considered to implement a common format for tracking dislikes between all of the youtube modifier extensions, and storing them in a common database. In this manner, anyone who adds any one of these extensions would be assisting the war against YouTube's stupidity.

The best case might be an independent library that could be loaded in by all developers who wish to assist. In that case, everyone would literally be on the same page.

Greedquest commented 2 years ago

@deefdragon See https://github.com/elliotwaite/thumbnail-rating-bar-for-youtube/issues/50#issuecomment-970426095_

I think rather than sharing data between users of each extension (which may violate terms of the chrome store/ be more effort than it's worth), or having a single independent API/service you can contribute to (which is good in principle but how do you get people to sign up), the biggest yt modification extension with the most users (which was originally sponsor block, but now return-youtube-dislike is growing rapidly) will maintain an API the other extensions can use.

Maybe there will be slightly tailored approaches in each extension depending on the algorithm for generating the dislikes.

elliotwaite commented 2 years ago

@marioeljuga, now that the extension has been updated to use the Return YouTube Dislike API which seems to be working reasonably well, should we close this issue?

Greedquest commented 2 years ago

@elliotwaite I think more than that, the opportunity to scrape statistics and store them is gone now so this feature is kind of moot.

elliotwaite commented 2 years ago

True. I did scrape some data in that time, but I think Return YouTube Dislike has a sufficient dataset is planning ways to continue expanding it.

Okay, I'll close this issue.