Closed ayuishii closed 1 year ago
I very much share the concern that it needs to be clear to site authors that the number of buckets isn't something they need to be particularly concerned about, lest they bias towards using a small number of buckets and defeating the benefits of multiple storage buckets.
Also, it's not clear from the text here if this is intended to be an implementation hard limit or some type of quota mechanism?
At a higher meta level, it seems like it would be great to have some guidance about what's the right size for a bucket. For example, if writing an offline music application, there are some pretty clear hierarchy levels at which to make the cut:
My intuition is that per-album/playlist is the right balance.
More general guidance for when there isn't as clear domain alignment would be that buckets make sense as soon as the amount of data we're talking about reaches 10 MiB. This might be an alternate means of dealing with the bucket limit issue. We define buckets to take up a minimum quota usage of 10MiB (or other value) and that therefore you may be limited in how many buckets you can create by the quota granted to your origin through implicit and explicit user interaction.
Thank you for the thoughts @mkruisselbrink & @asutherland ! Added some of my personal thinking here, but would be awesome to get alignment on this point 🙂
An implementation supporting 2 buckets is likely quite different from an implementation supporting 1000 buckets.
Thats a good point... thanks for pointing this out. Anything too low seems like it would disincentives the API usage, but too many would also create a poor experience. Instinctively I think 10 seems like a reasonable limit. But open to thoughts.
if this is intended to be an implementation hard limit or some type of quota mechanism?
My initial intent was to have a hard limit, so an origin won't be able to abuse the API by creating thousands of buckets, which could affect the performance of other sites.
At a higher meta level, it seems like it would be great to have some guidance about what's the right size for a bucket.
Thats a good point.. In your example for the music application, my personal thoughts are that buckets would be divided in bigger groups. Buckets divided by function & importance / expected life. A bucket for user's personal playlists that are in heavy rotation (that you'd prefer never to be evicted), a bucket for recommended playlists for the week (that may expire after a week) etc. But something with a completely different function like analytics would have its own separate bucket that can be deleted/evicted independently.
But at the same time I also wouldn't want to add something that would disincentivize its usage. Whether by hard limit or quota mechanism, curious about what you think on how many buckets you'd expect an origin to be able to have at any one time?
We define buckets to take up a minimum quota usage of 10MiB (or other value) and that therefore you may be limited in how many buckets you can create by the quota granted to your origin through implicit and explicit user interaction.
How do you see the creation limit being expressed in this scenario? Do you see it erroring on bucket creation once it has been reached?
An implementation supporting 2 buckets is likely quite different from an implementation supporting 1000 buckets.
Thats a good point... thanks for pointing this out. Anything too low seems like it would disincentives the API usage, but too many would also create a poor experience. Instinctively I think 10 seems like a reasonable limit. But open to thoughts.
One of my take-aways from discussions in the ServiceWorkers WG was that teams within a company that operate sub-sites within a single origin may not operate under a global coordination scheme. Having developers have to worry about how to divvy up a resource that there's potentially only 10 of seems like it would encourage people not to use buckets except in very exceptional cases.
My initial intent was to have a hard limit, so an origin won't be able to abuse the API by creating thousands of buckets, which could affect the performance of other sites.
I think having buckets have a minimum quota cost seems like a more dynamically scalable situation than a hard limit, while addressing scenarios where a site might try and use buckets as a means of data storage that isn't charged against quota. If a user really wants a site to use 100 GiB of storage, should that site be limited to the same number of buckets as a random site the user has never visited before?
Thats a good point.. In your example for the music application, my personal thoughts are that buckets would be divided in bigger groups. Buckets divided by function & importance / expected life. A bucket for user's personal playlists that are in heavy rotation (that you'd prefer never to be evicted), a bucket for recommended playlists for the week (that may expire after a week) etc. But something with a completely different function like analytics would have its own separate bucket that can be deleted/evicted independently.
But at the same time I also wouldn't want to add something that would disincentivize its usage. Whether by hard limit or quota mechanism, curious about what you think on how many buckets you'd expect an origin to be able to have at any one time?
Expect? Unsure. Want? (quota usage) / 10MiB.
My goal at this time would be for the browser to have maximally granular choices to make under storage pressure about bucket discarding. An origin that has 2x 2GiB buckets and 1x analytics bucket that they have an interest in heavily gaming to ensure it never gets cleared doesn't provide a lot of options, especially as access patterns would most likely touch every bucket during every session. An origin that has 40x 100MiB buckets that could likely have accurate MRU-dates associated with each of them would be amazing because it lets a naive bucket discarding algorithm make a lot more clear-cut less-risky decisions.
How do you see the creation limit being expressed in this scenario? Do you see it erroring on bucket creation once it has been reached?
I think that in general sites will fall into 2 categories:
For the first, common case... as the origin asks for more buckets that exceed the quota we're willing to give it, we'd start discarding buckets from the origin. In the lead-up to discarding the origin's own buckets, this might involve discarding some buckets from other origins first.
For a site that's very aware of quota, we'd potentially have the following 2 events we might be able to tell it:
waitUntil()
on this event with a promise that you'll resolve when you're done with the cleanup. Then we'll re-evaluate the most recent openBucket
call. (Note that this would never be used to wake up an origin and tell it to clean itself up; I believe there is consensus we would never wake up a ServiceWorker to give it an opportunity to respond to storage pressure because that would be the worst time to do it, has privacy implications, and undercuts any motivation for sites to use buckets responsibly.)Our handling would be the same, except we'd fire the "bucket-discarding" event and potentially wait for it to finish.
One of my take-aways from discussions in the ServiceWorkers WG was that teams within a company that operate sub-sites within a single origin may not operate under a global coordination scheme. Having developers have to worry about how to divvy up a resource that there's potentially only 10 of seems like it would encourage people not to use buckets except in very exceptional cases.
I don't recall these partners showing interest in using buckets for isolation between product teams. My impression is they don't have too much trouble doing that with database naming conventions, etc. Maybe it would make them a bit less concerned about using too much disk, but they seem more concerned on user impact there and less on impacting another team. Finally, I think there are cross-product integrations that would want everything to be in the same quota bucket to avoid some data disappearing, etc.
Edit: Note, the service worker discussions took place because they didn't have an equivalent method of isolation to our database naming, etc.
Its a question we could ask them more directly, though. @ayuishii, what do you think?
The hypothetical I was thinking of was more like a team thinking: "If I risk using buckets but some other sub-site has used up some of the very finite allowed number of buckets, then my sub-site can break, so I just won't use buckets." I'm very confident in teams being able to prefix their storage names to avoid conflicts. (Also, I was thinking of comments by non-Googlers.)
It would definitely be interesting to hear what those partners think about the possibilities of having the ability to create a lot of buckets, especially from sites that are only intermittently used. I would expect sites that see daily usage and/or are continually opened in pinned tabs to not need to worry about bucket discarding due to storage pressure and not want to deal with the overhead of using a bunch of buckets. I would expect intermittently used sites are more likely to be interested in more graceful/granular discarding.
It'd also be interesting to know whether, if using a ton of buckets isn't appealing, if letting a bucket opt-in to Cache-granularity discarding would be an acceptable trade-off. Continuing with the offline music player scenario and @ayuishii's bucket usage proposal, it would meet my idealized granular quota dreams if the "recommended playlists" bucket used a separate Cache for each of these playlists, but then the site could still use CacheStorage.match() to not have to deal with the partitioning. There has been some discussion in https://github.com/w3c/ServiceWorker/issues/863 in this area albeit more focused on per-Response LRU eviction.
All that said, I'm open to the idea that the reality is that multiple storage buckets might only be used like conceptual fire safes/lock-boxes/flight data recorders where the expectation is that:
Ah, sorry. I misinterpreted your concern as folks wanting a separate bucket for every product.
How many buckets per origin do you expect to be reasonable in practice? I think we are trying to reason about per-bucket overhead and we might come to different conclusions for less than 10 buckets-per-origin vs 1000s of buckets-per-origin. For example, do you have a separate database internally for every bucket vs a single database that has a column for bucket-id, etc.
My primary concern is the UX related to quota. If origins use a lot of buckets that are moderately sized, it becomes easy to grant origins quota incrementally via moderately sized buckets and easy to reclaim quota incrementally and without user involvement or sites constantly appearing to forget everything on a device with limited storage. Ideally we will adjust our storage implementation to whatever provides for the best UX for this and can survive the reality of the usage patterns of the web.
Do other browsers have documentation about their existing or planned quota management strategies, particularly expected behavior when operating with limited storage and whether user prompting is/will be involved? (Edit: To be clear, my ongoing plan has been to pin all my hopes on multiple storage buckets.)
Thanks for raising concerns here. I agree that I think it would be valuable to gather more developer feedback here on how it will be used. The API is very much still in early stages and wouldn't want to move forward if the API design doesn't match the use cases. I'm thinking Origin Trial would be a good opportunity to do gather this feedback. Does this sound reasonable?
Do other browsers have documentation about their existing or planned quota management strategies, particularly expected behavior when operating with limited storage and whether user prompting is/will be involved?
I haven't been able to find any documentation from other browsers for their quota management strategies. This page is the best resource I'm aware of for quota management comparisons.
I found the Chrome Web Storage and Quota Concepts doc while reading some other Chrome proposals which nicely characterizes current LRU data-clearing (which is also what Firefox currently uses). Many thanks to the authors of that doc and kudos on the many explanatory diagrams!
There's a good amount of quality discussion here covering the intended use cases of buckets in general. I opened #60 to focus on just limiting the number of buckets used (which might or might not be exposed via something like maxCount
). @asutherland wdyt?
As far as some of the other ideas here, such as firing events when a site asks for a bucket and there's no room, I think the simplest thing to do for now is to stick with what we have --- QuotaExceededError
when a site tries to store and there's no more room. The site can clean itself up and try to create a bucket again on its own, just as it can free up space then try to store more things in IDB on its own. But if there is demand for this kind of thing in the future, we can consider extending the API later.
We also see #44 as a topic of much interest.
I've replied on https://github.com/WICG/storage-buckets/issues/60. I'd actually seen the comment when it was made and had begun composing a response but lost it in tab bankruptcy; apologies!
As far as some of the other ideas here, such as firing events when a site asks for a bucket and there's no room, I think the simplest thing to do for now is to stick with what we have ---
QuotaExceededError
when a site tries to store and there's no more room.But if there is demand for this kind of thing in the future, we can consider extending the API later.
Yeah, I wouldn't worry about adding new events at this point. Also, my sketch there is along the lines of @wanderview's corruption reporting proposal and we'd want to integrate with that.
Closing as we've decided not to expose maxCount
. We can re-visit if there is request for this in the future. In the meantime, we'll plan to throw a QuotaExceededError
when a sites tries to create too many buckets, and add some text to the explainer and/or spec.
Thanks for the discussion!
Allow user agents to decide a maximum number of Storage Buckets for an origin.
maxCount
attribute will inform developers of the maximum number of buckets an origin is allowed to have at any one time.