WebKit / explainers

Explainers from WebKit contributors
374 stars 29 forks source link

Feedback #32

Closed fchristant closed 4 years ago

fchristant commented 4 years ago

I'm not sure if developer feedback is welcome at this point, but here goes some of my thoughts:

Backwards compatibility By far my biggest concern regarding this proposal is backwards compatibility. I think it is safe to say that most of the web is legacy, even more so if you would include enterprise applications. These properties are barely maintained, sometimes not at all, or not in the control of the client making use of it.

In the most brutal of scenarios, if permissions are to be taken away from these applications because they fail to meet the "credible login" conditions, they effectively break. And not in a minor way. They become unusable entirely.

I'm just saying this to highlight the importance of the detection mechanism needing to be fool proof for current applications. You cannot expect these web applications to make code changes. They won't, or often they can't.

FDA Don't quote me on this part, but I want to at least mention it. I work in a company where some web properties are FDA relevant. One particular rule of FDA concerns password entry. If you'd take the most careful interpretation of that rule, conclusion is that password management tools are disallowed. The user has to type the password themselves.

This could actually be the wrong interpretation of the rule, I don't know, not an expert. But yes, this means the escape hatch may be needed. And preferably one without a new code change, because of compat reasons.

Browser privacy This standard could actually mean that browsers know things about users that they didn't before, as an example, a username. Which may be reused across websites. As many browsers are built by commercial parties with interests in other sectors (cough....ad tech), I think it would at least be honest to mention or discuss the other side of the privacy coin.

Describe the privacy issue

The current behavior of the web is “logged in by default,” meaning as soon as the browser loads a webpage, that page can store data such as cookies virtually forever on the device. That is a serious privacy issue and also bad for disk and backup space. Long term storage should instead be tied to where the user is truly logged in.

I get the (pretty minor) problem of needlessly storing data for too long, yet I don't entirely understand the "serious privacy issue". What is the privacy issue? What are tangible risks? I think it would be good to describe this at length.

johnwilander commented 4 years ago

Hi Ferdy! Thanks for filing.

I'm not sure if developer feedback is welcome at this point, but here goes some of my thoughts:

Absolutely welcome.

Backwards compatibility By far my biggest concern regarding this proposal is backwards compatibility. I think it is safe to say that most of the web is legacy, even more so if you would include enterprise applications. These properties are barely maintained, sometimes not at all, or not in the control of the client making use of it.

In the most brutal of scenarios, if permissions are to be taken away from these applications because they fail to meet the "credible login" conditions, they effectively break. And not in a minor way. They become unusable entirely.

The IsLoggedIn status may be used in many ways that won't be fleshed out here or prescribed by an eventual specification.

What we are mentioning specifically here is capping the lifetime of storage and "other powerful features and relaxations of restrictions besides storage that the web browser only wants to offer to websites where the user is logged in."

I can't speak for other browsers or what the future may hold, but I imagine that restricting existing features and permissions to only websites with IsLoggedIn status will be in cases of security or privacy problems, and maybe in cases of web API misuse. Without the IsLoggedIn status, such restrictions may have to be deployed across the board instead or be put behind piecemeal permissions and prompts, so IsLoggedIn may turn out to be an enhancement even for such cases.

I'm just saying this to highlight the importance of the detection mechanism needing to be fool proof for current applications. You cannot expect these web applications to make code changes. They won't, or often they can't.

FDA Don't quote me on this part, but I want to at least mention it. I work in a company where some web properties are FDA relevant. One particular rule of FDA concerns password entry. If you'd take the most careful interpretation of that rule, conclusion is that password management tools are disallowed. The user has to type the password themselves.

This could actually be the wrong interpretation of the rule, I don't know, not an expert. But yes, this means the escape hatch may be needed. And preferably one without a new code change, because of compat reasons.

Thanks for sharing a distinct case where unmanaged login flows are currently a must.

Browser privacy This standard could actually mean that browsers know things about users that they didn't before, as an example, a username. Which may be reused across websites. As many browsers are built by commercial parties with interests in other sectors (cough....ad tech), I think it would at least be honest to mention or discuss the other side of the privacy coin.

I can't speak for other browsers but the addition of knowing a username through its use in an API like IsLoggedIn would not be a meaningful regression of privacy in the case of a malicious browser. I don't think websites have a way to protect themselves against a browser engine that tries to grab user data. Are you saying websites do this?

Describe the privacy issue

The current behavior of the web is “logged in by default,” meaning as soon as the browser loads a webpage, that page can store data such as cookies virtually forever on the device. That is a serious privacy issue and also bad for disk and backup space. Long term storage should instead be tied to where the user is truly logged in.

I get the (pretty minor) problem of needlessly storing data for too long, yet I don't entirely understand the "serious privacy issue". What is the privacy issue? What are tangible risks? I think it would be good to describe this at length.

The privacy issue is virtually endless remembering of users' web activity. You as a user may casually visit a website once because someone sent you a link or even because you mistakenly tapped something. The browser should be able to help you not be remembered forever by all those websites and any intermediary navigational redirects while still keeping you logged in to websites where you are truly logged in.

fchristant commented 4 years ago

Thanks for getting back to me, John.

"The IsLoggedIn status may be used in many ways that won't be fleshed out here or prescribed by an eventual specification."

Based on your description, what I get from it is that in the context of logging in, a likely restriction worst case could be less storage length, not full storage blockage. That puts my mind at ease regarding compatibility.

For now, because I do have a general concern regarding the trend of browser heuristics where browsers decide to allow/disallow functionality based on ever moving goal posts and non-specced behavior. I believe something is allowed or not allowed, across browsers, it shouldn't depend on today's weather or opinionated closed implementations. This isn't the place to discuss that trend, I suppose.

"Thanks for sharing a distinct case where unmanaged login flows are currently a must."

Welcome, I dug up the relevant section: Link

I lack FDA expertise so have no opinion on this text or how to interpret it. Yet this section is the basis for the decision to disable password managers for some applications.

"I can't speak for other browsers but the addition of knowing a username through its use in an API like IsLoggedIn would not be a meaningful regression of privacy in the case of a malicious browser"

But why not? Currently, if you'd log into my website, the browser does not know my username on that website. It doesn't even know I'm logged in, hence this proposal. With the proposal implemented, the browser knows my username, which may be considered personal data. So the browser has gained personal data about the user it did not have before. My remark is about browsers, not websites. Browsers know more than before.

"The privacy issue is virtually endless remembering of users' web activity. You as a user may casually visit a website once because someone sent you a link or even because you mistakenly tapped something. The browser should be able to help you not be remembered forever by all those websites and any intermediary navigational redirects while still keeping you logged in to websites where you are truly logged in."

I still don't fully understand this. If I visit a website once and then don't come back for months, sure, perhaps a cookie is still stored, needlessly. Yet this stale cookie is of no use to the website unless I revisit. Likewise, if the website has analytics, which surely it has, my past visit is going to remembered anyway. Killing the cookie doesn't erase my visit. In what tangible way is my stale cookie a privacy risk?

johnwilander commented 4 years ago

Thanks for getting back to me, John.

"The IsLoggedIn status may be used in many ways that won't be fleshed out here or prescribed by an eventual specification."

Based on your description, what I get from it is that in the context of logging in, a likely restriction worst case could be less storage length, not full storage blockage. That puts my mind at ease regarding compatibility.

For now, because I do have a general concern regarding the trend of browser heuristics where browsers decide to allow/disallow functionality based on ever moving goal posts and non-specced behavior. I believe something is allowed or not allowed, across browsers, it shouldn't depend on today's weather or opinionated closed implementations. This isn't the place to discuss that trend, I suppose.

"Thanks for sharing a distinct case where unmanaged login flows are currently a must."

Welcome, I dug up the relevant section: Link

I lack FDA expertise so have no opinion on this text or how to interpret it. Yet this section is the basis for the decision to disable password managers for some applications.

"I can't speak for other browsers but the addition of knowing a username through its use in an API like IsLoggedIn would not be a meaningful regression of privacy in the case of a malicious browser"

But why not? Currently, if you'd log into my website, the browser does not know my username on that website. It doesn't even know I'm logged in, hence this proposal. With the proposal implemented, the browser knows my username, which may be considered personal data. So the browser has gained personal data about the user it did not have before. My remark is about browsers, not websites. Browsers know more than before.

For the purposes of privacy and data collection, a malicious browser would know those things. It just wouldn't know them to a certainty to build browser features on top of them. The browser renders the form fields that the user enters information into and can do whatever it likes with data form data. Or are you talking about some out-of-band login flow where an auth cookie gets set due to activities outside the browser? Could such a case be supported by the website setting an anonymous user name of sorts? Either as a built-in feature or by just submitting "Anonymous" as the username string.

"The privacy issue is virtually endless remembering of users' web activity. You as a user may casually visit a website once because someone sent you a link or even because you mistakenly tapped something. The browser should be able to help you not be remembered forever by all those websites and any intermediary navigational redirects while still keeping you logged in to websites where you are truly logged in."

I still don't fully understand this. If I visit a website once and then don't come back for months, sure, perhaps a cookie is still stored, needlessly. Yet this stale cookie is of no use to the website unless I revisit. Likewise, if the website has analytics, which surely it has, my past visit is going to remembered anyway. Killing the cookie doesn't erase my visit. In what tangible way is my stale cookie a privacy risk?

We are talking about all uses of cookies and website data. Browsers want to have the ability to make sure websites don't remember visits virtually forever. Erasing the visit is what browsers want to be able to do. Users don't expect a website they visited casually two years ago to remember that visit and they don't want their browser to help the website do that remembering.

fchristant commented 4 years ago

I see what you mean. Yes, a browser has full access to post data and such and could guess what is a username or other personally identifiable information from that. Yet it would be a guess. With IsLoggedIn, they will be absolutely sure.

I'm not necessarily saying this is a dramatic privacy concern. I'm saying it's privacy relevant if the browser would store such information, depending on jurisdiction. It's just something to think about. Or maybe not. I just wanted to mention it as food for thought.

"We are talking about all uses of cookies and website data. Browsers want to have the ability to make sure websites don't remember visits virtually forever."

Sure. The storage part of this statement is clear. The privacy part isn't. What is a tangible privacy risk of having a long browser history? One I can think of is somebody gaining access to my computer and gaining insight into this. Do you have other tangible examples? I don't question there are privacy risks, I'm asking to specify them. I think this strengthens the proposal.

"Users don't expect a website they visited casually two years ago to remember that visit and they don't want their browser to help the website do that remembering"

Kind of agree, although its opinionated and subjective. Besides various downsides, a long browser history, login state, etc also brings user convenience.

johnwilander commented 4 years ago

By breaking out and porting the two above issues to the W3C repo, I consider this issue resolved. You are most welcome to file further issues at https://github.com/privacycg/is-logged-in/issues. Thank you!