birchill / 10ten-ja-reader

A browser extension to translate Japanese by hovering over words.
https://addons.mozilla.org/firefox/addon/10ten-ja-reader/
GNU General Public License v3.0
599 stars 45 forks source link

iOS version deletes the data when being disabled #840

Open birtles opened 2 years ago

birtles commented 2 years ago

I received the following report on twitter:

On iOS, every time I turn off the extension, the dictionary database is cleared. I can see No database in options. Every time I turn on the extension, it redownloads the databases, which is very long and battery expensive on an iPhone 7 Plus. Not sure if you were aware of that problem.

image

I'm trying to clarify how the user is disabling the extension but I wonder if anyone has seen this? @shirakaba perhaps?

shirakaba commented 2 years ago

Just to get us all on the same page, I'll coin some terms for distinguishing these two ways of turning on/off 10ten-ja-reader:

Managing extensions (let's call it "enabling"/"disabling")

I believe these two areas are equivalent (if you move the slider in one place, the state is in sync when visiting the other place):

Managing extensions within Safari Managing extensions within system Settings

Activating/deactivating enabled extensions

Then there's this area of the UI, within Safari:

Enabled, activated state Enabled, deactivated state

You can't deactivate a disabled extension, because there's nothing active (no JS code running) to deactivate.

The Extension Settings Menu

And, just to give this menu a name:

Behaviour

Intuitively, I'd expect dictionary data to only be removed upon disabling the extension. Not upon deactivating it.

In fact, I found during brief testing on my iPhone 11 Pro running iOS 15.1 that my dictionary data survives whether I deactivate or disable the extension. I have over 100 GB of spare storage, however.

Could it be that the user's iPhone is low on storage?

birtles commented 2 years ago

Thank you so much for the very clear explanation. I will show that to the reporter and see if that helps.

Like you say, though, my best guesses too at this point are:

  1. The reporter is disabling the extension, as opposed to deactivating it, or
  2. The device is low on storage and Safari is aggressively deleting the data as soon as the background page is evicted (i.e. when no more content scripts are keeping it alive with a message port) -- basically ignoring the unlimitedStorage permission from the manifest.
Pacoup commented 2 years ago

Hey there, I am said user who sent the report on Twitter. To be honest, I'm a bit dumbfounded, as I'm unable to reproduce the issue. Dictionary data currently survives both deactivating and disabling the extension, as per the terminology above, and as as per the behaviour observed by @shirakaba.

I thought maybe the storage was being cleared by the 7-day cap imposed by ITP due to my infrequent usage, which would be a surprising thing to do for extensions (couldn't find any documentation on this), but the Dictionary data does actually survive changing my iOS clock to 7+ days later.

Dictionary data also survives reboots.

To be honest, this is a problem I had weeks ago at least (possibly on a beta build of 15.1), hadn't used the extension since, and when I checked the Extension Settings Menu the other day to see if it was still refreshing the Dictionary data and saw “No database”, I thought this was an actual proof of the problem. In reality though, this appears to be a message that is shown when the database is loading: https://github.com/birchill/10ten-ja-reader/blob/0d8c86e95896f97d45bcda9648a0565968963563/src/options.ts#L1066-L1075

From hibiki-data, I can see that the Initializing DataSeriesState may mean the application doesn't yet know whether there's a valid database or not (merely making suppositions based on a cursory look at the code here): https://github.com/birchill/hikibiki-data/blob/85fbbf20e06c8cb658f409adbc33faa065435644/src/database.ts#L30-L32

So, in certain cases, I can get the “No database” message to show even though the database hasn't been cleared. Eventually, reloading the Extension Settings Menu does show the database versions, although I'm unable to get reliable steps to reproduce this. It only appears after I've stopped using my phone for a few minutes, or when I've been doing other things on my phone for a few minutes (probably something in RAM being cleared):

https://user-images.githubusercontent.com/112764/143312653-dff5de53-f49e-4cc9-8e38-b9fb037ca8af.mp4

The key step in getting the database details to show appears to be force quitting Safari. In the video, I also quit the Settings app, but this makes no difference. (Don't mind the date in the future, it's from when I toyed with the clock to test the ITP policies)

Refreshing the database on my 7 Plus would usually take several minutes, so here, it doesn't look like the database is actually being refreshed at any point. It's just the message thing from the Initializing sate.

Therefore, I think perhaps there was a Safari storage bug in the version of iOS I was using that has since been fixed, and the “No database” message wasn't an indication of this supposed bug.

My iPhone 7 Plus has 53.3 GB storage free out of 128. Running 15.2 beta 3 as we speak.

birtles commented 2 years ago

Hey there, I am said user who sent the report on Twitter. To be honest, I'm a bit dumbfounded, as I'm unable to reproduce the issue. Dictionary data currently survives both deactivating and disabling the extension, as per the terminology above, and as as per the behaviour observed by @shirakaba.

Hi @Pacoup! Thank you very much for following up here!

I checked out my notes of "things to look into later for the iOS version" and I noticed I have a note to "Investigate why DB sometimes becomes unavailable on iOS" so I guess I encountered that too at some point.

I thought maybe the storage was being cleared by the 7-day cap imposed by ITP due to my infrequent usage, which would be a surprising thing to do for extensions (couldn't find any documentation on this), but the Dictionary data does actually survive changing my iOS clock to 7+ days later.

When I've observed the data apparently being unavailable it was only a few minutes or so after it was available so I agree it's probably not the 7-day time limit. Thank you for checking that though!

To be honest, this is a problem I had weeks ago at least (possibly on a beta build of 15.1), hadn't used the extension since, and when I checked the Extension Settings Menu the other day to see if it was still refreshing the Dictionary data and saw “No database”, I thought this was an actual proof of the problem. In reality though, this appears to be a message that is shown when the database is loading:

https://github.com/birchill/10ten-ja-reader/blob/0d8c86e95896f97d45bcda9648a0565968963563/src/options.ts#L1066-L1075

Yes, you're right. We should change that to say "Initializing..." instead so at least we can differentiate between the two cases.

So, in certain cases, I can get the “No database” message to show even though the database hasn't been cleared.

This is the weird bit and matches what I believe I saw too on my iPhone 7 (not Plus).

Thank you for all the other details and the screencast too. That's incredibly helpful.

One other piece of information that I came across this morning that might be helpful is that Safari for iOS actually doesn't keep background/event pages live even with active ports:

One limitation would be that inactive Ports would no longer prevent the Event page from suspending, something Safari for iOS is already doing.

(source https://github.com/w3c/webextensions/issues/134#issue-1049674790)

So tying together all these observations, my best guess is that after some time the event page is suspended. This might happen more often when memory is low, and perhaps the fact that we've seen this on two iPhone 7's (2Gb / 3Gb) but not a more recent iPhone 11 (4Gb) points to that.

After that, when we next try to communicate with the background page either by looking something up or by opening the options page, the database state is reported as "initializing". In the options screen, however, we conflate "initializing" with "no database" so we can't tell.

That doesn't explain why the data is re-downloaded but it sounds like that no longer happens (which is a relief). There certainly have been some significant IndexedDB bugs fixed in Safari recently so it's possible that it was due to a bug that has been fixed 🤞

It also doesn't quite explain quite why it takes so long for the database state to update in that screencast. Perhaps Safari is just really slow at initializing IndexedDB (possible) or perhaps we're not updating the database state correctly.

Next steps:

Pacoup commented 2 years ago

This really seems to be memory-related. I have new behaviours.

After that, when we next try to communicate with the background page either by looking something up or by opening the options page, the database state is reported as "initializing". In the options screen, however, we conflate "initializing" with "no database" so we can't tell.

I was initially under the impression that the dictionary lookup always works, and that opening the options through the dictionary popup always shows the database versions. This still seems to be true (versus what I show below), which suggests something may be causing one of the events here not to fire when opening options via the Settings app.

Is the options page being suspended, closing the Port perhaps, or is the onload event not fired for some reason or the code is unable to acquire a new Port?

https://github.com/birchill/10ten-ja-reader/blob/0d8c86e95896f97d45bcda9648a0565968963563/src/options.ts#L781-L814

It also doesn't quite explain quite why it takes so long for the database state to update in that screencast. Perhaps Safari is just really slow at initializing IndexedDB (possible) or perhaps we're not updating the database state correctly.

My understanding is the sate isn't updated at all. You have to force quit Safari. The Check for updates button does nothing.

In both cases, the data doesn't appear to re-download. It's still there.

New behaviours

Upon trying to confirm the database versions are always shown when loading options via the dictionary popup, I stumbled upon something else. My procedure was to open as many apps as possible to make sure the memory-related issue was triggered without having to leave my phone aside for an as-of-yet unknown amount of time required for the issue to show up.

Trying the dictionary lookup after opening all the apps, I was faced with a popup window without a tab bar:

Heading into the options from the Settings app, I was faced with this new state. No database, but a Last check message is present:

Tapping on the Check for updates button this time actually performs the entire database download. The DB appears to have been deleted as I had observed some weeks ago:

Could something be causing the database to exceed the 10 MB limit set by the unlimitedStorage permission? Doesn't seem to be the problem, however, since the database works fine for a while.

My impression is Safari doesn't have persistent storage permissions yet (or apparently, this is being added in 15.2) and the total disk space allotted to Safari storage is quite low, and possibly shared with other web views. Since a lot of apps are web view heavy, opening many apps could have caused Safari to evict some IndexedDBs in storage to make space. It wouldn't make sense that these are stored in RAM, but perhaps iOS considers web storage to be more of a cache, and loads parts of it in RAM for performance, hence the aggressive eviction policy. Or they just don't want Safari storage to blow up.

This could also explain the behaviour observed on the iPhone 11, as it is not impossible that newer iPhones have larger storage quotas in line with more system memory, although I feel like the storage eviction and the No database message issues are two different problems. However, the No database message showing up could be related to total memory anyway.

Nevertheless, this could all be wrong, as the dictionary still appears functional before checking for updates, so the IndexedDB wasn't evicted and something else is causing the re-download (if that's even what's happening, I'm not sure about the update process)?

birtles commented 2 years ago

Thanks @Pacoup! That's super helpful.

I've written up a little summary below of how 10ten does lookups in case that helps interpreting the different states you see.

Background: how 10ten does lookups 10ten comes bundled with a snapshot of the word dictionary that it can load into memory to do lookups. This in-memory snapshot is used whenever we can't access the IndexedDB database which most often happens either (a) when the add-on has recently been installed and we haven't finished downloading the data yet, or (b) when we can't store data for some reason (e.g. Firefox users who have set their browser to "always private mode" or because we reached the storage quota). This ensures we can always return _something_ even if it's potentially out-of-date. The snapshot doesn't include localized glosses and doesn't include the name or kanji dictionaries so we miss out on that information too. Generally we prefer to use IndexedDB because we can update the data frequently without the user having to download and install potentially large updates and because it requires less memory. That is, if we detect the IndexedDB database is available then we never load the snapshot into memory. When you see the popup with no tabs, that means it has fallen back to the snapshot in memory due to conditions described below. One complication, however, is that we have a lookup cache in the content script. It's very small (only about 10 entries or so) and exists mostly because very often the user will scan the cursor back and forth within a word and we don't want it to perform redundant lookups each time the cursor moves to a different character. We don't cache lookups when they come from the snapshot since we expect that's fast enough without the cache and will soon be replaced by the up-to-date IndexedDB version anyway. As a result, when debugging this it's entirely possible that you _think_ you're getting results from the IndexedDB version of the dictionary (because you see all the tabs), but you're actually just getting a cached entry.

Getting to your comments...

Is the options page being suspended, closing the Port perhaps, or is the onload event not fired for some reason or the code is unable to acquire a new Port?

I think we can rule out the onload event not being fired since that should mean the entire options page was broken.

Furthermore, I think that if we failed to acquire a port in the first place--either because something earlier in that function threw, or because connect() itself threw, our error reporting setup would be reporting that. Hopefully, anyway.

My understanding is the sate isn't updated at all. You have to force quit Safari. The Check for updates button does nothing.

I wonder if that's because we have a disconnected port that silently fails when we try to post to it, as opposed to the port not being established at all?

For now I've added some code to try to detect the port being disconnected and re-connect just to see if that helps.

Trying the dictionary lookup after opening all the apps, I was faced with a popup window without a tab bar:

This state indicates that the word database is either:

as per this code:

https://github.com/birchill/10ten-ja-reader/blob/8a0e313bab6392292a154d1bc1e360acc0a19471/src/background/jpdict.ts#L111-L130

For now I've updated the code that reports the database state in the options page so we can differentiate between the above cases a bit more easily.

As per your observations though, it seems like this is a case of the database being empty.

Could something be causing the database to exceed the 10 MB limit set by the unlimitedStorage permission? Doesn't seem to be the problem, however, since the database works fine for a while.

Oh, I hadn't seen that. That's quite interesting.

My impression is Safari doesn't have persistent storage permissions yet (or apparently, this is being added in 15.2) and the total disk space allotted to Safari storage is quite low, and possibly shared with other web views.

I didn't know that either. I really hope that helps, but given that you're on 15.2 beta 3 it seems unlikely.

Since a lot of apps are web view heavy, opening many apps could have caused Safari to evict some IndexedDBs in storage to make space. It wouldn't make sense that these are stored in RAM, but perhaps iOS considers web storage to be more of a cache, and loads parts of it in RAM for performance, hence the aggressive eviction policy. Or they just don't want Safari storage to blow up.

This could also explain the behaviour observed on the iPhone 11, as it is not impossible that newer iPhones have larger storage quotas in line with more system memory, although I feel like the storage eviction and the No database message issues are two different problems. However, the No database message showing up could be related to total memory anyway.

That certainly seems possible. I really don't know what Safari is doing. As we discovered when implementing the puck, Safari seems to have a lot of interesting heuristics!

I agree it seems likely that Safari is simply tossing out the IndexedDB data based on ✨conditions✨.

If we could work out some reliable steps to reproduce this then we have a few contacts at Apple who might be able to confirm/deny our suspicions.

However, if the answer is that we really can only rely on 10Mb of data, then I'm not sure what we can do except going back to bundling all the dictionaries with the add-on and loading them all into memory. That would be a shame to have to support that configuration given that for most users/platforms it's unnecessary.

I am currently reworking the setup for downloading the data files to hopefully make that a bit more streamlined (avoiding downloading 60 patch updates for the name dictionary for example). Once that's done it might make re-downloading a little less painful but adding hundreds of thousands of records to IndexedDB is still going to take time (typically much longer than downloading the same records).

(If only Chrome would get on board with Mozilla's "Limited Event Pages" we could use absurd-sql and work around the IndexedDB performance issue so that even if a full re-download is triggered it is sufficiently fast that most users don't notice. Last time I checked, I think the Brotli'd download size of all the data is only ~14Mb or so and I expect that will get smaller after I rework things.)

For now I'm adding the extra diagnostics I mentioned above. I will try to implement the touchscreen version of the "Copy to clipboard" function and then trigger a release some time next week. Hopefully after that we can get a clearer picture of what is going on.

Pacoup commented 2 years ago

I've written up a little summary below of how 10ten does lookups in case that helps interpreting the different states you see.

Thanks, super helpful.

I wonder if that's because we have a disconnected port that silently fails when we try to post to it, as opposed to the port not being established at all?

That sounds correct. I'll test when I get the update.

I didn't know that either. I really hope that helps, but given that you're on 15.2 beta 3 it seems unlikely.

Yeah, not sure what the expectation is with regards to getting this permission through a web extension. Perhaps this behaves like Firefox and requires some UX for the user to approve:

o8W7pNTZ5dFKeDg2cmvA

However, if the answer is that we really can only rely on 10Mb of data, then I'm not sure what we can do except going back to bundling all the dictionaries with the add-on and loading them all into memory. That would be a shame to have to support that configuration given that for most users/platforms it's unnecessary.

Do you mean the current IndexedDB exceeds 10 MB?

I can see 163 MB in Chrome, but I imagine that if there was an actual 10 MB limit in Safari Web Extensions, you would get some kind of quota error. Perhaps this works similarly to Firefox and Chrome:

So if the free space on your hard drive is 500 GB, then the total storage for a browser is 250 GB. If this is exceeded, a process called origin eviction comes into play, deleting an entire origin's worth of data until the storage amount goes under the limit again. There is no trimming effect put in place to delete parts of origins — deleting one database of an origin could cause problems with inconsistency.

It does sound like Safari is evicting the origin for the 10ten extension as described in Firefox's process, but the total storage allocation may be quite low comparatively:

https://github.com/WebKit/WebKit/blob/e98ff129bc8eba06ac17105f19c5f0e142aab853/Source/WebCore/storage/StorageQuotaManager.h#L43-L45

This makes it look like the default quota is 1 GB, and per origin is 100 MB, although there appears to be a mechanism for requesting more storage:

https://github.com/WebKit/WebKit/blob/a3e3204b332206fe574fff421ac05ab7ec4390ae/Source/WebCore/storage/StorageQuotaManager.cpp#L72-L107

I found these from the original commit that added the storage quotas for IDB:

https://trac.webkit.org/changeset/237700/webkit/ https://github.com/WebKit/WebKit/commit/c0b64862bcfe536bd4596b605085f67058dc60c4

Although this was later changed to a storage-generic quota:

https://bugs.webkit.org/show_bug.cgi?id=196545

Besides the persistent-storage permission, losing the IDB data occasionally seems to be unavoidable.

birtles commented 2 years ago

Thank you again for following up on this!

I didn't know that either. I really hope that helps, but given that you're on 15.2 beta 3 it seems unlikely.

Yeah, not sure what the expectation is with regards to getting this permission through a web extension. Perhaps this behaves like Firefox and requires some UX for the user to approve:

o8W7pNTZ5dFKeDg2cmvA

Yes, this is a very odd area. In Firefox, for Web Extensions even if you specify unlimitedStorage, you don't actually get it until you call persist() (and persist() in that case does not trigger a prompt).

However, if the answer is that we really can only rely on 10Mb of data, then I'm not sure what we can do except going back to bundling all the dictionaries with the add-on and loading them all into memory. That would be a shame to have to support that configuration given that for most users/platforms it's unnecessary.

Do you mean the current IndexedDB exceeds 10 MB?

Yes. I haven't checked what Safari's disk usage is but I believe Firefox ends up using about 400Mb of storage. The downloaded file size should be in the order of 10~20Mb but once we extract it and stick it into IndexedDB and add a few indices, the corresponding SQLite file it stores on disk is massive.

I believe Safari uses SQLite for storing IndexedDB so I expect the actual storage used there is of similar magnitude. (Apparently, Blink is the only engine that doesn't use SQLite for IndexedDB and is slower as a result.)

I can see 163 MB in Chrome, but I imagine that if there was an actual 10 MB limit in Safari Web Extensions, you would get some kind of quota error.

Yes, indeed.

So if the free space on your hard drive is 500 GB, then the total storage for a browser is 250 GB. If this is exceeded, a process called origin eviction comes into play, deleting an entire origin's worth of data until the storage amount goes under the limit again. There is no trimming effect put in place to delete parts of origins — deleting one database of an origin could cause problems with inconsistency.

Yes, this is a bit annoying. There is a proposal for Storage Buckets that would allow us to, say, put the names dictionary in a separate bucket and let that be evicted first, but no one has implemented that spec yet.

It does sound like Safari is evicting the origin for the 10ten extension as described in Firefox's process, but the total storage allocation may be quite low comparatively:

https://github.com/WebKit/WebKit/blob/e98ff129bc8eba06ac17105f19c5f0e142aab853/Source/WebCore/storage/StorageQuotaManager.h#L43-L45

This makes it look like the default quota is 1 GB, and per origin is 100 MB, although there appears to be a mechanism for requesting more storage:

https://github.com/WebKit/WebKit/blob/a3e3204b332206fe574fff421ac05ab7ec4390ae/Source/WebCore/storage/StorageQuotaManager.cpp#L72-L107

Oh nice! Thanks for looking those up!

Besides the persistent-storage permission, losing the IDB data occasionally seems to be unavoidable.

Right. It's not entirely clear what quotas are extended to Web Extensions or even if we're hitting that quota or simply some other kind of heuristic.

I couldn't see any mention of handling of unlimitedStorage in the WebKit trac or Bugzilla so I suppose that part is closed source.

For what it's worth my iPhone 7 has 114Gb of free storage (it's a test phone with almost nothing else installed on it) and I've certainly seen the "database not available" issue. I don't know that I've seen it re-download the data however.

birtles commented 2 years ago

For now I'm adding the extra diagnostics I mentioned above. I will try to implement the touchscreen version of the "Copy to clipboard" function and then trigger a release some time next week. Hopefully after that we can get a clearer picture of what is going on.

I've finally released 1.6.0 which should have the extra diagnostics. (The touchscreen version of "Copy to clipboard" took a lot longer than anticipated 😅)