Scope app history entries to the browsing context group

domenic commented 3 years ago

Currently they're scoped to same-origin contiguous and same-frame. However we also need to account for browsing context group swaps due to COOP.

In the explainer this will be a minor sentence update or so, but we'll need caution when speccing.

domenic commented 3 years ago

@jakearchibald asks the question: why do we want to scope things to the BCG? @csreis, you suggested this; can you explain why that would be a good idea? Jake's point is that the data stored in the app history (e.g. URL or state) is no more sensitive than data that is stored in IndexedDB or similar, which is not partitioned according to BCG.

csreis commented 3 years ago

Part of the motivation for COOP is making it possible to load a page in a new process (e.g., in browsers without full Site Isolation), which becomes explicit if you also use it with COEP. That means we shouldn't be mixing app history entries (with URLs that might contain tokens) across COOP/COEP boundaries, since that could put a sensitive URL in the wrong renderer process.

Splitting at the BCG is one way to accomplish that, and it also matches my intuitive notion that you've moved into a new, unrelated state. (Chrome has many other ways to trigger a new BCG, such as cross-site browser-initiated navigations and even some same-origin cases for bfcache.) Maybe there's another way, but it seemed like a sensible mechanism.

jakearchibald commented 3 years ago

I'm not sure I agree with this, but it might be because my understanding of related things is broken. Let me know where I start to go wrong:

The security boundary of the web is the origin. Some legacy features don't quite fit that model, and that's usually what we end up firefighting.

We enforce the origin boundary using different features and architectures, like CORS, CORB, COOP, COEP, processes etc etc. Our goal is to stop origin A reading data from origin B without origin B's permission, or for origin A and origin B to exchange information about the user without their permission.

APIs like <img>, <style>, <script>, <video> etc etc can result in credentialed bytes from origin B ending up in the same process as origin A. These APIs weren't properly built with a strict origin model in mind, and have been a source of security bugs.

Origin B <iframe> content can also end up in the same process as origin A on platforms that can't put iframes in a different process.

Spectre/meltdown created a situation where origin A could use a timing attack to read memory in the same process, which puts those origin B resources at risk. As such, we removed the ability for origin A to create accurate timers, so the timing attack isn't effective.

COOP+COEP can grant origin A the power to create accurate timers (and access to things like SAB which can be used to create timers) in exchange for a system where legacy no-cors APIs are locked down, so they're not allowed to use responses from other origins unless the other origin has explicitly opted in (CORS/CORP), we call this "cross origin isolation".

This system is there to protect origin B from origin A-with-high-res-timers. It isn't protecting origin A from origin B, and it isn't there to project origin A-isolated from origin A-not-isolated, which continue to share storage and can freely communicate using things like a service/shared worker and BroadcastChannel.

Allowing origin A's app history to cross between isolated and non-isolated pages could result in a situation where URLs accessed in an isolated mode end up in the same process as an origin B iframe. However, origin B would not have access to high resolution timers, so origin A's data is protected using the same mechanisms that protect other data that's shared between origin A-isolated and origin A-not-isolated (such as idb, service workers, localstorage etc etc).

Restricting app session histories to BCG is creating a security rule more granular than "the security boundary of the web is the origin", but it's kinda doing a half-job of it, since all other origin storage (including session storage) continues to be shared.

If we wanted to protect origin A-isolated from origin A-not-isolated, it feels like we should do it properly by creating some kind of sub-origin. However, I don't think that's necessary, or at least, I don't think we should bundle that behaviour with "I'd like SharedArrayBuffer pls".

In https://github.com/whatwg/storage/issues/119 there was cross-browser agreement that session storage could be shared across BCGs, and app history feels like a form of session storage to me.

In https://github.com/whatwg/html/issues/6356 I proposed the idea of a "session", which would scope history APIs and session storage. However, I'm not trying to subdivide the "origin is the boundary" rule here, I'm trying to limit things like history.length and history.go(…) which already breach the origin boundary. Making session storage part of this limit is more of a UX feature than a security feature.

Does that make sense?

domenic commented 3 years ago

Some discussion in https://freenode.logbot.info/whatwg/20210412#c7603220 . One idea there was that there could be a way to say "my URL is more sensitive than session storage and should be protected even from same-origin scripts", similar to referrerpolicy="". But tying that to COOP+COEP doesn't necessarily make sense.

annevk commented 3 years ago

I think it's reasonable to want to be excluded from same-origin history to avoid leaking a sensitive URL. Perhaps this could even be aligned with Referrer Policy.

In chatting a bit with @domenic I'm not entirely sure how COOP+COEP plays into this.

If you go from /sensitive-id (with COOP+COEP) to / (without) then:

/ learns /sensitive-id through Referer unless redacted through Referrer Policy.
A <script>-ad on / can learn the same as /.
A cross-site document ad on / can learn something if the browser lacks OOPIFs.

To me the stronger concern is /sensitive-id deploying Referrer Policy correctly, but there now being a workaround to obtain /sensitive-id. That seems less than ideal. That you can also obtain it if the browser lacks OOPIFs is kinda downstream from that, but perhaps there's a COOP+COEP/lacking COOPIF angle I'm missing?

domenic commented 3 years ago

@csreis do you have any thoughts on @jakearchibald and @annevk's comments? I pretty firmly agree that browsing context group scoping doesn't make sense here (and introducing it would also hurt any future efforts to swap BCGs more often).

If we can get agreement on that, then we can pivot to discussing referrer policy. (Conceptually, I'd say that we want to make entry.url null if the referrer policy was "no-referrer". Figuring out how to encode that and any edge cases presumably is more complicated...)

csreis commented 3 years ago

I feel strongly that we should make it possible for origins to protect themselves in the face of Spectre, in scenarios where Site Isolation doesn't help (e.g., browsers without OOPIFs, cross-origin same-site, selective isolation, etc). That includes protecting sensitive tokens in URLs. If there's another mechanism origins can use to avoid having such URLs end up in the renderer process, maybe that's sufficient. Adding @mikewest and @camillelamy on this, as they're more responsible for API interactions with Spectre defenses.

For the protection goal, it's very important to note that Spectre-type attacks are still possible without high precision timers, just at a slightly reduced bandwidth. This is demonstrated in practice by https://leaky.page. The restriction of high precision timers says we aren't going to make it easy for A to attack B, but A can still attack B without using SharedArrayBuffers/etc.

I agree with Jake that COEP and CrossOriginIsolated are more about protecting B from A-with-high-res-timers, which is orthogonal. I care about B protecting itself from A-with-no-high-res-timers.

To achieve that, the early discussions around COOP (e.g., https://github.com/whatwg/html/issues/3740) were trying to make it possible for pages to put their sensitive info into a BCG that browsers could use a separate process for, without sharing a process with an attacker origin out of their control. If example.com still allows some of its documents to load in a non-isolated context, then yes, they have to worry about what data they load into that process (when they access localStorage, indexed DB, etc). But they could at least have some control over that, by not using those APIs in non-isolated contexts.

If AppHistory doesn't show same-origin URLs across BCGs, then it won't leak them across COOP boundaries. Maybe it could be defined in terms of COOP boundaries instead of BCG? That might be fine. (It shouldn't be gated on CrossOriginIsolated or COEP, though, since that's orthogonal.)

Restricting app session histories to BCG is creating a security rule more granular than "the security boundary of the web is the origin", but it's kinda doing a half-job of it, since all other origin storage (including session storage) continues to be shared.

It's a question of whether an origin can ever have a way protect itself, even if there isn't a great solution right now. If the AppHistory API pulls in URLs with tokens outside of the origin's control, then the origin is forced to leak those secrets in the processes of its non-isolated documents, no matter what it refrains from doing in the non-isolated documents. However, if we introduce a different way to protect those URLs from ending up in the renderer (as mentioned above), maybe the AppHistory case doesn't have to handle it explicitly.

And for what it's worth, I'm concerned that we don't have more attention on this goal. Avoiding API usage in non-isolated settings is hardly practical, but I don't think I've heard any other solutions for protecting secrets from Spectre attackers using low res timers, besides broader Site Isolation or origin-wide usage of COOP/etc. Maybe some of the new defaults Mike and Camille are pushing for will help?

In whatwg/storage#119 there was cross-browser agreement that session storage could be shared across BCGs, and app history feels like a form of session storage to me.

I hadn't seen that, but you can at least avoid accessing session storage from your non-isolated pages. App History can't be avoided, can it?

annevk commented 3 years ago

@csreis it would help me a lot if you directly addressed the scenario in https://github.com/WICG/app-history/issues/71#issuecomment-817927890 or provided an alternative scenario where you do see a problem. I think we're all on the same page that Spectre attacks are possible, regardless of cross-origin isolation. And also that you might use cross-origin isolation for sensitive pages. But cross-origin isolation doesn't protect against URLs leaking and by design cannot as we do not know if the next URL will have cross-origin isolation or not.

csreis commented 3 years ago

Sure. Here's one scenario, which doesn't involve COI:

1) User visits attacker.com which loads example.com/public.html in the same process (e.g., in a popup, with no Site Isolation). public.html doesn't access any secrets in its process, and has no COOP header. 2) User clicks a link in public.html to example.com/private.html, which has a COOP header to force a new BCG in the same tab. The browser puts private.html into a new process separate from attacker.com, since private.html isn't reachable by any existing documents. 3) A separate login navigation (e.g., signing in or redirecting) puts a secret token into the URL, like private2.html?secrets. 4) The user goes back in that tab (across BCGs) to public.html.

IIUC, AppHistory would then put the contiguous same-origin session history items into public.html's renderer process (shared with attacker.com), including private2.html?secrets. The attacker.com document now has access to the secrets in the URL.

Another variation of step 1 could happen if the user visits example.com/public.html directly and it has a cross-origin iframe it doesn't fully trust (e.g., ad).

These cases only matter if the browser is allowing attacker.com and example.com to share a process in the first place, but that's still common.

In terms of https://github.com/WICG/app-history/issues/71#issuecomment-817927890, this is similar to /sensitive-id leaking to / despite ReferrerPolicy. If ReferrerPolicy or something similar can be used to prevent public.html from learning private2.html?secrets (and the latter URL from ending up in the process), that would address my concern.

annevk commented 3 years ago

Thanks, that helps (me) a lot. It's interesting how this is a concern for top-level navigations alone and it does indeed argue for considering (top-level browsing context, origin) as the key for top-level history entries.

jakearchibald commented 3 years ago

But isn't public.html choosing to pull that information into the process, just as it could choose to pull data from IndexedDB into the process?

I feel that in future we're going to want to do a lot more BCG swaps, and that would lead to gotchas in this API.

csreis commented 3 years ago

Are you referring to pulling the app history entry information? The app history entries have to get populated in the renderer process in advance without public.html having to do anything to request it, right? (That's necessary to allow synchronous access to the entries, which I understood to be a requirement.)

annevk commented 3 years ago

@jakearchibald how many same-origin navigation swaps are we planning beyond COOP?

jakearchibald commented 3 years ago

The app history entries have to get populated in the renderer process in advance without public.html having to do anything to request it

@csreis hmm yes, you're right. It's the same as sessionStorage/localStorage then, which is also sync.

@domenic, I couldn't persuade you to make that async?

jakearchibald commented 3 years ago

@annevk

@jakearchibald how many same-origin navigation swaps are we planning beyond COOP?

I thought the idea was to allow the browser to change process on navigation in any case where it didn't need to be in the same process. I probably misunderstood something.

domenic commented 3 years ago

@domenic, I couldn't persuade you to make that async?

No :)

I thought the idea was to allow the browser to change process on navigation in any case where it didn't need to be in the same process. I probably misunderstood something.

That is kind of my understanding of the direction the Chrome team is heading, so I'd be curious to get @csreis's thoughts...

jakearchibald commented 3 years ago

@domenic, I couldn't persuade you to make that async?

No :)

Boo! Fair enough.

Although, I'm a little worried that n history entries each containing large state objects could become a memory issue, and a performance issue when it comes to copying the data on each navigation.

domenic commented 3 years ago

Regardless, it sounds like a plan here that makes people happy and is probably a good idea regardless is:

If a page uses "no-referrer" as its referrer policy, its URL is not exposed through app history.
With this in place, app history can span browsing context groups

Is that correct?

csreis commented 3 years ago

@csreis hmm yes, you're right. It's the same as sessionStorage/localStorage then, which is also sync.

I'm not 100% certain, but I think those are loaded on demand via a sync Mojo IPC (StorageArea::GetAll), rather than proactively.

I thought the idea was to allow the browser to change process on navigation in any case where it didn't need to be in the same process. I probably misunderstood something.

I think there are a few cases that want to swap browsing context groups on more navigations (even same-origin ones when there's just a single window), such as bfcache or dynamically isolating a site (e.g., for password entry).

If a page uses "no-referrer" as its referrer policy, its URL is not exposed through app history.

That could probably work. It does seem useful to set ReferrerPolicy to avoid leaking secrets in a URL across a COOP boundary (separately from App History), so having that apply to app history entries might be a way to prevent them from leaking from earlier session history items.

What about other parts of the app history entry in the "no-referrer" case? The key is probably fine, but should the state be omitted? Going back across a COOP/BCG boundary from private.html (with a sensitive state object) to public.html would otherwise leak the state object.

domenic commented 3 years ago

What about other parts of the app history entry in the "no-referrer" case? The key is probably fine, but should the state be omitted? Going back across a COOP/BCG boundary from private.html (with a sensitive state object) to public.html would otherwise leak the state object.

I was leaning toward not censoring state. The reason being that, unlike "having a URL", putting something in the app history state has to be done proactively. So it's similar to, e.g., using IndexedDB: if you put something there, you're OK with any same-origin pages (or content that shares a process with same-origin pages) reading it.

Does that reasoning make sense?

domenic commented 2 years ago

I've put up a PR for this in https://github.com/WICG/app-history/pull/189, which censors cross-document "no-referrer" document URLs, and makes it clear that once that's done we expect all session history entries, even after BCG swaps, to be included. The folks involved here should feel free to take a look!

csreis commented 2 years ago

I was leaning toward not censoring state. The reason being that, unlike "having a URL", putting something in the app history state has to be done proactively. So it's similar to, e.g., using IndexedDB: if you put something there, you're OK with any same-origin pages (or content that shares a process with same-origin pages) reading it.

Does that reasoning make sense?

Sorry for the late reply, but I still think state should have been omitted as well as the URL (similar to @annevk on https://github.com/WICG/app-history/pull/189#issuecomment-973016951).

If the page uses a ReferrerPolicy to say that its URL shouldn't leak to the next page, what is a scenario where it would want the App History Entry's state to be exposed? That seems like it would quite often reveal aspects of what page you were on, partly defeating the purpose of the ReferrerPolicy.

Are there uses of App History state that would make sense to expose across the ReferrerPolicy boundary?

domenic commented 2 years ago

I mean, it's pretty similar to any case when you want to expose origin-scoped storage across the ReferrerPolicy boundary? Like IndexedDB or session storage or so on? We've heard from a number of developers that they would like to port their current uses of session storage / manually-cleared-on-session-exit IndexedDB to app history state, and giving them the nasty surprise that doing so fails whenever used with certain referrer policies would not be a good idea, I think.

csreis commented 2 years ago

Isn't App History state specific to a session history item, though? That seems different than origin-scoped storage, but maybe I'm misunderstanding. When you're on a different (same-origin, contiguous) session history item, that state is presumably accessible and read-only, similar to the full URL of the referrer from the previous item in most cases.

If the premise of setting ReferrerPolicy to none is not leaking sensitive parts of the URL even to same-origin pages, then a state specific to a session history item seems like it would fall into the same category. That state would still be preserved and available if you go back to the session history item where it was created, but it seems like it shouldn't be readable by other session history items. Seems like that matches the spirit of the ReferrerPolicy.

Can you say more about how IndexedDB or session storage use cases would be built on App History state? It doesn't seem great as a "store this value for other page visits" approach, since your current changes to the state would presumably be undone if you went back. Maybe I'm not following how that would work or why it requires exposing state in the ReferrerPolicy case.

domenic commented 2 years ago

The idea is that developers are currently using history state as session storage keyed on index into the session history. For example, you include the state of UI elements in it. They don't think of that as related to the referrer policy or the URL.

Here is a simple toy example:

Page A is a capability URL, say "unsubscribe me from this newsletter"
Page B is the homepage, which users often visit after visiting the unsubscribe page.
Every page on the origin has a page-specific dark mode/light mode toggle, which is saved into app history state. On this site, this is session-specific data; you can invent other more-realistic session-specific data if you want, but it's usually UI state of this sort.
Every page does something like if (destinationAppHistoryEntry.getState().mode === appHistory.current.getState().mode) { /* no transition effect */ } else { /* transition effect */ } when navigating between them.

The developer wants to hide page A's capability URL, so they use no-referrer referrer policy on it. They don't want to hide its dark mode/light mode toggle state, because that would break their beautiful transitions.

Remember that state is not a URL, which every session history entry cannot avoid having, and so we need to provide a way of censoring. (Referrer policy.) It's something you opt in to putting data in, like using other origin-scoped session storage. If, for some reason, the dark mode/light mode state of page A above was sensitive data, then the developer would not store it in app history state.

csreis commented 2 years ago

Ok, thanks. I was picturing it more as "here's some data that I need to get back into the state I was in the last time I was on this history item," such as which newsletters were selected in your unsubscribe page example, or something else specific to the sensitive page.

I agree that the developer should not store sensitive data in app history state if this is the outcome, and I'll trust your judgement on whether they would try to store it there in practice (vs less sensitive things like the dark/light mode toggle you mention). I'll defer on this, under the assumption that the non-sensitive case is how the API will tend to be used, and since it's under the control of web developers.

WICG / navigation-api

Scope app history entries to the browsing context group #71