Privacy concern - Githubissues

brace110 commented 2 years ago

Browser

Brave

Browser Version

Latest

Extension or Userscript?

Extension

Extension/Userscript Version

Latest

Video link where you see the problem

Every

What happened?

How can we be sure our viewing habbits are not being tracked if every video I watch is being sent to your domain:

fetch(
    `https://returnyoutubedislikeapi.com/votes?videoId=${getVideoId()}`
  ).then((response) => {
    response.json().then((json) => {

Can we solve this question to market/make this plugin even more popular and trusted by everyone?

Anarios commented 2 years ago

You can't. And there is no way to solve it either (well, giving everyone admin rights to all infrastructure would solve it).

aryavsaigal commented 2 years ago

maybe we can add an advanced option so it only fetches dislikes for the videos you want (by the click of a button) you wouldnt need to see dislikes of a creator you trust a lot and entertainment videos.

if people are concerned they can just enable this and check dislikes for videos they dont trust, which should be pretty less compared to all the videos you watch.

Anarios commented 2 years ago

maybe we can add an advanced option so it only fetches dislikes for the videos you want (by the click of a button) you wouldnt need to see dislikes of a creator you trust a lot and entertainment videos.

if people are concerned they can just enable this and check dislikes for videos they dont trust, which should be pretty less compared to all the videos you watch.

that would be one way to solve it. But then again - it would still expose all videos that you requested (and people would request most of their videos)

Anarios commented 2 years ago

You can also block it in firewall, and only unblock when you want to see dislikes .

Anarios commented 2 years ago

or disable it, and only enable when you need it. you get the idea.

aryavsaigal commented 2 years ago

that would be one way to solve it. But then again - it would still expose all videos that you requested (and people would request most of their videos)

It's harder to be sure of someones viewing habits by less videos fetched

Anarios commented 2 years ago

that would be one way to solve it. But then again - it would still expose all videos that you requested (and people would request most of their videos)

It's harder to be sure of someones viewing habits by less videos fetched

yep, but imagine flipping the switch every time you want to see dislikes

aryavsaigal commented 2 years ago

that would be one way to solve it. But then again - it would still expose all videos that you requested (and people would request most of their videos)

It's harder to be sure of someones viewing habits by less videos fetched

yep, but imagine flipping the switch every time you want to see dislikes

it will probably be a button on the youtube page itself, shouldnt be much of an inconvenience to those using this

Aerophus commented 2 years ago

Should have kept this open instead of immediately closing it, atleast so we can have an open discussion about possible implementations of what I think its a pretty good idea.

Anarios commented 2 years ago

It was never opensource.

zannini commented 2 years ago

It was never opensource.

Ah. I'm sure the community will be very happy to use a tool that is not creepy. Google is already creepy we shouldn't make the thing creepier.

Don't worry we will solve this problem over Christmas break. Be it with this software or another 😉

Anarios commented 2 years ago

So, making a request to a server is creepy now? You shouldn't be using github then either.

Be it with this software or another 😉

Good luck.

Anarios commented 2 years ago

By the way - a bunch of copycat-extensons died today once I enabled IP rate limiting.

They were just calling my api in their backend - no own DB, no caching - nothing. Just pretending to provide a service while in reality they didn't. Now imagine they had a DB dump and server code - what good would it make - more userbase fragmentation, less reliable votes? And all while using my work for free.

brace110 commented 2 years ago

As the OP, I'd like to mention to @Anarios that in no way I'm claiming malicious intent. I am very grateful for your work. I was just wondering if we can provide some sort of way to guarantee privacy, to improve the extension and see it succeed like we all want.

I suppose a few options would be possible:

Calling a public Youtube API instead of a custom domain (if possible)
Having a toggle added to the extension and only when requested the calls are made to the back-end to request dislike info (although people can theoretically always put the Extension on 'Active on Click mode' to achieve similar results.
Having the backend code also run opensource so it can be verified by others.
Allow people to host their own servers who handle the JS calls. (Too technical for most people, but the option would be interesting for developers)
... many others?

I think discussing options here, like mentioned above, would be a great idea. Perhaps we will come to a conclusion that nothing can be done. But at least we explored all options.

k1nxx commented 2 years ago

As the OP, I'd like to mention to @Anarios that in no way I'm claiming malicious intent. I am very grateful for your work. I was just wondering if we can provide some sort of way to guarantee privacy, to improve the extension and see it succeed like we all want.

I suppose a few options would be possible:

Calling a public Youtube API instead of a custom domain (if possible)

Having a toggle added to the extension and only when requested the calls are made to the back-end to request dislike info (although people can theoretically always put the Extension on 'Active on Click mode' to achieve similar results.

Having the backend code also run opensource so it can be verified by others.

Allow people to host their own servers who handle the JS calls. (Too technical for most people, but the option would be interesting for developers)

... many others?

I think discussing options here, like mentioned above, would be a great idea. Perhaps we will come to a conclusion that nothing can be done. But at least we explored all options.

100% Agree, But youtube is still selling your data, so whats the point ? Also in order for this extension to be alive, it requires user data. Thus proving it useless without such data

sy-b commented 2 years ago

@brace110

Calling a public Youtube API instead of a custom domain (if possible)

That's the reason there is a separate domain. YT removed the ability to get dislikes from YT API.

Having a toggle added to the extension and only when requested the calls are made to the back-end to request dislike info (although people can theoretically always put the Extension on 'Active on Click mode' to achieve similar results.

Agreed. Many people had already suggested this.

Having the backend code also run opensource so it can be verified by others.

It is in plan. Just the thing is that, right now, the focus is on making the data meaningful

Allow people to host their own servers who handle the JS calls. (Too technical for most people, but the option would be interesting for developers)

I probably didn't get this one. Can you explain why?

aryavsaigal commented 2 years ago

Calling a public Youtube API instead of a custom domain (if possible)

public api doesnt return dislikes anymore

Having a toggle added to the extension and only when requested the calls are made to the back-end to request dislike info (although people can theoretically always put the Extension on 'Active on Click mode' to achieve similar results.

didnt know that existed, people can easily do that

Allow people to host their own servers who handle the JS calls. (Too technical for most people, but the option would be interesting for developers)

that would only be archived counts, after a while it will get outdated and youd have to keep syncing the database

Having the backend code also run opensource so it can be verified by others.

technically the host can modify the backend to collect your data and sell it while the open source code doesnt show the modifications

I don't think theres anything you can do instead of Active on Click mode honestly.

Anarios commented 2 years ago

Now imagine worst case scenario - say, I was as evil as it gets - what could I track? A random ID that you can regenerate at any time and an IP that can be dynamic\behind NAT\behind VPN.

So if you use a dynamic IP or a VPN - there is nothing I can track. This would be a real solution to privacy concern. Unlike a non-solution of posting server sources.

k1nxx commented 2 years ago

I'd suggest making this a public company

Anarios commented 2 years ago

I suppose a few options would be possible: Calling a public Youtube API instead of a custom domain (if possible)"

there is no public youtube api for dislikes.

Allow people to host their own servers who handle the JS calls.

I don't see what it solves. You can build extension from sources and replace my API with your own even today. Or I didn't understand this point.

brace110 commented 2 years ago

an IP that can be dynamic\behind NAT\behind VPN.

Here in Europe we are forbidden to track IP-addresses without consent, they are being anonymized to 0.0.0.0, I run security for our company, so I deal with this a lot.

But I do see your point, the incoming data would be an IP-address, some headers, perhaps a browser user-agent and the video ID.

Theoretically you could build up a user profile on this information, for example a table that stores videos watched by certain IP's in a database. However this would be a tiny drop compared to what Google/Youtube is tracking about us.

I wonder if the "risks" here are actually overstated... What you do guys think?

Anarios commented 2 years ago

Theoretically you could build up a user profile on this information, for example a table that stores videos watched by certain IP's in a database

I doubt you could even sell it. If I could attach a cookie so that ads can be shown to you based on your watch history - that would be a gold mine. But you see that I don't do it in the frontend code.

Anarios commented 2 years ago

that would be a gold mine.

That would be what youtube already sells about you :)

Anarios commented 2 years ago

So instead of buying it from me - it's much more logical to just use google adds.

Anarios commented 2 years ago

There is a similar discussion in #45 , I cross-post some messages because general sentiment is very similar. Could I close this one and we continue there? what do you think?

brace110 commented 2 years ago

I have one last question, you mentioned other people using your endpoint. Since I don't need my dislikes added to the Youtube page itself, I just want to be able to lookup the dislikes on videos. Would you be open to allowing others to use your endpoint? (or even for me to fork this project and create an extension that doesn't need the current requirements:

"Alle youtube.com sites"
youtube.com
www.youtube.com
m.youtube.com

I think a lot of people would be happy to just have a tool that allows them to easily look up dislikes on certain videos. Without altering the Youtube page itself. Of course the regular users of this extension would prefer the Updated Youtube page, but this extension could be an add-on for technical users, or perhaps a separate extension.

You mentioned above that people already started using your endpoint, is this a problem? Traffic for example? or people abusing your work? I'd love to hear your thoughts.

observeroftime01 commented 2 years ago

Might I suggest doing what Sponsorblock is doing? They got around the privacy issue somehow.

sy-b commented 2 years ago

@Anarios

There is a similar discussion in #45 , I cross-post some messages because general sentiment is very similar. Could I close this one and we continue there? what do you think?

I think the is concerned with "privacy" but issue #45 is concerned with "Backend source code" This one covering a broader topic My suggestion is - let this remain open or start a discussion on this

aryavsaigal commented 2 years ago

I have one last question, you mentioned other people using your endpoint. Since I don't need my dislikes added to the Youtube page itself, I just want to be able to lookup the dislikes on videos. Would you be open to allowing others to use your endpoint? (or even for me to fork this project and create an extension that doesn't need the current requirements:

"Alle youtube.com sites"

youtube.com

www.youtube.com

m.youtube.com

I think a lot of people would be happy to just have a tool that allows them to easily look up dislikes on certain videos. Without altering the Youtube page itself. Of course the regular users of this extension would prefer the Updated Youtube page, but this extension could be an add-on for technical users, or perhaps a separate extension.

You mentioned above that people already started using your endpoint, is this a problem? Traffic for example? or people abusing your work? I'd love to hear your thoughts.

why not make a website for this tbh?

aryavsaigal commented 2 years ago

Might I suggest doing what Sponsorblock is doing? They got around the privacy issue somehow.

we are dealing with way more requests than sponsorblock and that would cost us a lot of extra bandwidth and cpu power which is costly 😑

brace110 commented 2 years ago

I have one last question, you mentioned other people using your endpoint. Since I don't need my dislikes added to the Youtube page itself, I just want to be able to lookup the dislikes on videos. Would you be open to allowing others to use your endpoint? (or even for me to fork this project and create an extension that doesn't need the current requirements:

"Alle youtube.com sites" youtube.com www.youtube.com m.youtube.com I think a lot of people would be happy to just have a tool that allows them to easily look up dislikes on certain videos. Without altering the Youtube page itself. Of course the regular users of this extension would prefer the Updated Youtube page, but this extension could be an add-on for technical users, or perhaps a separate extension.

You mentioned above that people already started using your endpoint, is this a problem? Traffic for example? or people abusing your work? I'd love to hear your thoughts.

why not make a website for this tbh?

This is a very good possibility, but then I'm still curious if I can use the owners endpoint for those calls, or if he has any concerns with that.

Anarios commented 2 years ago

"Would you be open to allowing others to use your endpoint? "

I already am, and many projects use it without issues. There is rate limit of 100 per minute and 10 000 per day - but if you contact me and describe your project and it's not a direct competitor (say, a chrome extension with slightly different icon :D ) - rate limits can be disabled for you.

You mentioned above that people already started using your endpoint, is this a problem?

Only a lack of attribution and directly copying everything was a problem. There are chrome extensions that are similar, yet different enough so that I don't mind them using the API as well (KellyC youtube dislike and dislike thumbnail bar)

Anarios commented 2 years ago

we are dealing with way more requests than sponsorblock and that would cost us a lot of extra bandwidth and cpu power which is costly 😑

the SB solution only works for viewing videos, and not for submitting edits. So it could be applied to just watching a video, but wouldn't work when you want to vote.

brace110 commented 2 years ago

I think my Privacy concerns are thoroughly discussed. For me personally we can close this thread and continue in https://github.com/Anarios/return-youtube-dislike/issues/45

Since opening up the back-end code seems to be on the roadmap if I understand correctly?

Having the backend code also run opensource so it can be verified by others.

It is in plan. Just the thing is that, right now, the focus is on making the data meaningful

Anarios commented 2 years ago

Since opening up the back-end code seems to be on the roadmap if I understand correctly?

Yes, but no solid ETA. Roughly around the time when logging-in with google oAuth is implemented. So, couple weeks to month.

CallMeAlexO commented 2 years ago

@brace110 Not to come off as rude or disrespectful, but what information in particular are you worried about leaking?

The extension uses JS' fetch command, and you can clearly see the headers being sent - it's only one header - "Content-Type", which has no identifying information. The only additional information you're sending is a URL parameter which is the requested videoID.

Therefore, in the absolute worst case - the most information the backend can have on you is the time of your request, and your public IP address, which everyone sees every time you go onto every website ever. I'm not sure what European law you're referring to, but if you're referring to this one then returnyoutubedislike is absolutely in the clear, because this law only applies to entities that can deanonymize you using your ISP. The decision is as follows:

By today’s judgment, the Court replies, [...] that a dynamic IP address [...] constitutes personal data with respect to the operator [of a website] if it has the legal means enabling it to identify the visitor with the help of additional information which that visitor’s internet service provider has.

Furthermore, they go on to state:

Second, the Court states that [a website] may collect and use a visitor’s personal data, without his consent, only to the extent that it is necessary to facilitate and invoice the specific use of services by that visitor, so that the objective aiming to ensure the general operability of those services cannot justify the use of such data after those services have been accessed.

This case clearly doesn't apply here, because if returnyoutubedislike will attempt to contact EU ISPs to deanonymize users it will (1) be a public request, which we'll know about and (2) will be denied because your ISP doesn't just give out that information.

If you're really worried, you can use a non-EU VPN like @Anarios suggested.

TL;DR Every request on the internet will show your public IP, that's just how the internet works, and storing your IP isn't illegal, because your IP is dynamically allocated to you by your ISP every single time. And in the absolute worst case, if it isn't, you can use a VPN.

Anarios commented 2 years ago

think I just came up with a solution. I will confirm with everyone in #72, but it should work even better.

Lets make extension never request a single video id. It is planned to display a ratio bar near every thumbnail. So why not request all videos at once, therefore making the watch history unavailable for server (because server will never know which single video out of 50 requested you actually watch).

Randomize id order before request, and voila

brace110 commented 2 years ago

@brace110 Not to come off as rude or disrespectful, but what information in particular are you worried about leaking?

The extension uses JS' fetch command, and you can clearly see the headers being sent - it's only one header - "Content-Type", which has no identifying information. The only additional information you're sending is a URL parameter which is the requested videoID.

Therefore, in the absolute worst case - the most information the backend can have on you is the time of your request, and your public IP address, which everyone sees every time you go onto every website ever. I'm not sure what European law you're referring to, but if you're referring to this one then returnyoutubedislike is absolutely in the clear, because this law only applies to entities that can deanonymize you using your ISP. The decision is as follows:

By today’s judgment, the Court replies, [...] that a dynamic IP address [...] constitutes personal data with respect to the operator [of a website] if it has the legal means enabling it to identify the visitor with the help of additional information which that visitor’s internet service provider has.

Furthermore, they go on to state:

Second, the Court states that [a website] may collect and use a visitor’s personal data, without his consent, only to the extent that it is necessary to facilitate and invoice the specific use of services by that visitor, so that the objective aiming to ensure the general operability of those services cannot justify the use of such data after those services have been accessed.

This case clearly doesn't apply here, because if returnyoutubedislike will attempt to contact EU ISPs to deanonymize users it will (1) be a public request, which we'll know about and (2) will be denied because your ISP doesn't just give out that information.

If you're really worried, you can use a non-EU VPN like @Anarios suggested.

TL;DR Every request on the internet will show your public IP, that's just how the internet works, and storing your IP isn't illegal, because your IP is dynamically allocated to you by your ISP every single time. And in the absolute worst case, if it isn't, you can use a VPN.

I think you're absolutely right, at first I didn't have a good grasp of the data being submitted to the custom endpoint, but I've talked it over in this thread and also on the Discord server. That's why I mentioned that the headers sent shouldn't really be of concern. See https://github.com/Anarios/return-youtube-dislike/issues/344#issuecomment-997235770

Furthermore I do like the suggestion made by @Anarios in his latest post (https://github.com/Anarios/return-youtube-dislike/issues/344#issuecomment-997288610) about changing the approach of sending video data to the server.

pukkandan commented 2 years ago

I should have read this issue before posting https://github.com/Anarios/return-youtube-dislike/issues/45#issuecomment-999803591 🤦 Most of my points have already been mentioned here.

Except this (which I think belongs here more than in #45):

So why not request all videos at once, therefore making the watch history unavailable for server (because server will never know which single video out of 50 requested you actually watch).

Perhaps a hash can be send similar to the SponsorBlock API https://wiki.sponsor.ajay.app/w/API_Docs#GET_.2Fapi.2FskipSegments.2F:sha256HashPrefix

Together with an option to disable submitting votes (which I believe is a planned feature), this would address all concerns regarding tracking of watch history

himanshudabas commented 2 years ago

Together with an option to disable submitting votes (which I believe is a planned feature),

@pukkandan this has already been implemented. ref: https://github.com/Anarios/return-youtube-dislike/pull/326

Perhaps a hash can be send similar to the SponsorBlock API https://wiki.sponsor.ajay.app/w/API_Docs#GET_.2Fapi.2FskipSegments.2F:sha256HashPrefix

this is an excellent way to solve the privacy issue. @Anarios can we do something like this?

optional voting has already been done, now the only privacy issue is fetching the video data, I think which can also be solved with this. that'd eliminate any privacy concern anyone has :)

Anarios commented 2 years ago

Together with an option to disable submitting votes (which I believe is a planned feature),

@pukkandan this has already been implemented. ref: #326

Perhaps a hash can be send similar to the SponsorBlock API https://wiki.sponsor.ajay.app/w/API_Docs#GET_.2Fapi.2FskipSegments.2F:sha256HashPrefix

this is an excellent way to solve the privacy issue. @Anarios can we do something like this?

optional voting has already been done, now the only privacy issue is fetching the video data, I think which can also be solved with this. that'd eliminate any privacy concern anyone has :)

I'm worried about increasing the load on DB and backend by 16 times. Once we have 5 billion videos from ArchiveTeam loaded to DB - it might get panful to use this approach.

Anarios commented 2 years ago

Also, if we're fetching multiple IDs (for videos in thumbnails), and we're getting ~16 results for each video requested - it's ~ 50 * 16 results returned by server with each navigation.

himanshudabas commented 2 years ago

Using a Prefix Search Trie seems like a good way to handle this + if some in-memory DB with TTL is used that can further reduce the load of db.

One issue however would be bandwidth. A lot be bandwidth would be waste simply for transferring unnecessary data.

enkeyz commented 2 years ago

Wonder, if @Anarios thinking about also putting the backend code on Github(if he used Go, I'd happily help).

Would dismiss any privacy concerns, and same time we could also contribute to it's development.

@himanshudabas Redis would solve most of the issues with that problem.

aryavsaigal commented 2 years ago

@enkeyz Anarios said that the backend is in .NET There will still be privacy concerns, there's no way to make sure that the code in the backend and github is the same He was thinking about making the backend public but im not sure what the progress on that is.

zopieux commented 2 years ago

On the sponsorblock-like prefix search API, which does provide some additional amount of privacy, and assuming the backend uses a SQL database such as Postgres (I don't know, the backend is not open source, see #45):

It's possible to write extremely efficient prefix search by using a dedicated index (trading extra storage space for much cheaper processing) and if this is not enough, it's relatively easy to re-architect the database slightly by sharding it by the video ID prefix, therefore making prefix search a no-brainer.

I'd suggest starting by implementing the simple index solution for a fixed length (rather than any length) such as 4, and see how it behaves: query performance, additional storage cost, and typical length of returned videos. Using a toy database with 12 million random IDs, I get sub-ms performance on a cheap VPS.

-- https://stackoverflow.com/a/21824039
CREATE INDEX idx_prefix4 ON video (LEFT(videoId, 4));

EXPLAIN ANALYZE SELECT * FROM video WHERE LEFT(videoId, 4) = 'uwUu';
-- Index Scan using idx_prefix4 on video  (cost=0.42..8.44 rows=1 width=14) (actual time=0.040..0.041 rows=1 loops=1)
--   Index Cond: ("left"(videoId, 4) = 'uwUu'::text)
 Planning Time: 0.057 ms
 Execution Time: 0.444 ms

I do not think the additional bandwith cost is the problem here. It's possible to make the protocol efficient by using binary data exchanges, in which case returning 1 or 16 vote counts will be similarly embarrassingly small.

ItsDrike commented 2 years ago

think I just came up with a solution. I will confirm with everyone in #72, but it should work even better.

Lets make extension never request a single video id. It is planned to display a ratio bar near every thumbnail. So why not request all videos at once, therefore making the watch history unavailable for server (because server will never know which single video out of 50 requested you actually watch).

Randomize id order before request, and voila

This still isn't actually a completely viable privacy-respecting solution, it may be better in some aspects, but it doesn't really solve it completely, in fact it could even be seen as worse by some.

The problem with an approach like this is that rather than getting the videos that the user is actually interested in and clicks on explicitly, you fetch everything on their recommended page, or if they're already on some video, you'll see that video along with the videos in the suggested related videos page. This data can still be used to determine what are the interests of the original user, since the youtube algorithm is quite good at predicting the content that the users are likely to click on and those recommended videos usually fit the user's interests at least to some extent. This means that you potentially get even more data about the user, since instead of just a few videos which the user requested explicitly, you'll be getting the user's whole personalized recommendations.

While this may be great for users which don't have a youtube account or they have a brand new youtube account without any personal watch history that could be used to predict what kind of videos would that user be watching, the long-term youtube users who's youtube feed is heavily based on their personal interests could still have an issue with an approach like this.

I think that the way to solve it would indeed be to simply send hashes as initially suggested in #72 and not even keep the actual video ids in the database. That way, whenever a user would be requesting the dislike amount for a video, from the extension client, they could just request that data with a hashed video id. This would pretty much solve the privacy issue because you wouldn't be able to recover the actual video id that a given user was watching no matter what you did (I mean, you could brute-force it but that just isn't viable).

A potential issue that you could have with this approach is obvious, it's also the benefit that you gain, that is loosing the ability to identify which video belongs to which dislike count. This means you wouldn't be able to do something like constructing a leaderboard of most disliked videos since you wouldn't actually know what were those videos, you'd only have their hashes.

As for the increased load this would have, I don't think it's an issue at all, apart from migrating the database from video ids to hashed video ids, which is a one-time operation, the hashing should be done on the client-side and it's a fairly quick operation, the request should then contain the hashed video id, it shouldn't be the server hashing it anyway as that would not give us any guarantee that it's actually being done at all since the server is closed-sourced and even if it wasn't we wouldn't know what's actually running on it.

I personally don't think this is a huge disadvantage for what's gained in terms of privacy, but that's just my opinion, you may not share it and deem leaderboards like this an important thing to be able to do. But it is undeniable that the one-way property of hashing would be a great benefit to privacy in this case.

pzduniak commented 2 years ago

An alternative would be to federate the data collection. I don't trust you, but I trust certain parties, who could anonymize the traffic for me. Simply add a third party, operating such "anonymization proxies", to obscure the actual origin of the request. They batch up all the users' requests, do some basic caching and send the requests to the main API.

The only caveat would be overcomplicating abuse prevention (turning it from very difficult to barely possible).

Anarios commented 2 years ago

This would pretty much solve the privacy issue because you wouldn't be able to recover the actual video id that a given user was watching no matter what you did

Not really. The space of all videoIds in youtube is very limited (the ACTUAL ids, not the possible ids). It's under 10 billion - so it's not a problem to reverse all hashes. So no privacy gains with this approach.

tomvaneyck commented 2 years ago

Using the SponsorBlock k-anonimity model should be more private than sending the recommendations, as 1) recommendations often contain the same videos and 2) a watched video is highly likely to be contained in the previous set of recommendations.

This makes it easier to assemble the sets of requests into a watch history that overlaps different dynamic IP addresses, effectively de-anonymizing them. Compare this to the sets of returned hashes using k-anonimity, where there is no relation at all between any two requests.

Anarios / return-youtube-dislike

Privacy concern #344

Browser

Browser Version

Extension or Userscript?

Extension/Userscript Version

Video link where you see the problem

What happened?