WICG / translation-api

A proposal for translator and language detector APIs
Other
123 stars 6 forks source link

Highlighting Privacy and User Consent Concerns #5

Closed ahmadalfy closed 6 months ago

ahmadalfy commented 7 months ago

I want to highlight two questions related to the explainer.

Automatic Detection of Supported Languages

Will users be prompted to consent to the browser detecting the languages it supports for translation or it will happen without the user consent? Automatic language detection could potentially reveal sensitive information about users' browsing habits and preferences as mentioned in the privacy section.

Downloading Missing Languages

When the browser needs to download additional language packs for translation, will users be explicitly asked for consent before initiating the download? This is crucial to prevent scenarios where users' device storage is filled with language packs without their consent, potentially leading to privacy concerns and unwanted resource consumption.

I am not sure if implementation details like these should be mentioned in the explainer or not.

domenic commented 6 months ago

Will users be prompted to consent to the browser detecting the languages it supports for translation or it will happen without the user consent? Automatic language detection could potentially reveal sensitive information about users' browsing habits and preferences as mentioned in the privacy section.

Can you explain these privacy concerns further?

The browser already sees all strings that are passed to JavaScript APIs. Such strings are not private to the browser.

Similarly to how we don't ask for user consent to tell the browser about the string "Hello world" when a page does myElement.innerHTML = "Hello world", I don't think we'd ask for user consent when a page does languageDetector.detect("Hello world").

When the browser needs to download additional language packs for translation, will users be explicitly asked for consent before initiating the download? This is crucial to prevent scenarios where users' device storage is filled with language packs without their consent, potentially leading to privacy concerns and unwanted resource consumption.

The privacy issues are being discussed in #3.

The storage issues are not generally covered by user consent, but instead by other UI. For example, right now pages can download ~unlimited large files into the HTTP cache or other storage locations using other web APIs, with no user consent. However, users have control over this via various browser settings, and unused items automatically get cleaned up over time.

ahmadalfy commented 6 months ago

Regarding the first point, I intended to convey that the canTranslate feature can be utilized to iterate through all available languages, enabling detection of supported languages. This functionality could potentially be employed for fingerprinting purposes. I envision a prompt similar to those used for geolocation or clipboard access, where users are informed that the website wishes to determine if their browser supports translation to, for instance, Japanese. This concerns the capabilities of the browser, not the strings passed to the API.

Regarding the storage concern, you are correct; however, this issue isn't specific to individual websites, as the dictionaries utilized for translations ought to be shared resources. My point is that a website could inadvertently download unnecessary dictionaries from the browser vendor, leading to excessive use of user storage. Somehow the user should be aware of that.

I hope this clarifies my points. Feel free to ask for further elaboration if needed.

domenic commented 6 months ago

OK. Regarding your first point, that is discussed in #3.

Regarding your second point, we don't have any plans to do anything different than other storage APIs for this storage.

I'll close this issue then, since it seems like most of the concerns here are covered by #3.