Lemmy UI silently fills Firefox local storage (images) without automatic deletion/cleaning

tepozoa commented 23 hours ago

Requirements

[X] This is a bug report, and if not, please post to https://lemmy.ml/c/lemmy_support instead.
[X] Please check to see if this issue already exists.
[X] It's a single bug. Do not report multiple bugs in one issue.
[X] It's a frontend issue, not a backend issue; Otherwise please create an issue on the backend repo instead.

Summary

I noticed under Firefox: Settings -> Privacy my local cookie/cache data was taking some 3G of space which felt off. I opened Manage Data, sorted by size and lo and behold lemm.ee (my home instance) was using all that space. I cleared the cache, verified empty in about:cache as empty yet the consumed lemm.ee space it was still listed. I had to resort to using the History function "forget this site" to actually clear out all used space.

I'll try and present my findings, but I'm sort of guessing my way through this as an observer and end user, not a dev or instance admin. @sunaurus runs the instance and helps with Lemmy code, I hope they do not mind me tagging them into this issue.

Steps to Reproduce

Browse your Lemmy instance for a long while, opening many images along the way using the click-to-expand feature of the article list views (click on image on left without opening full post)
Observe in Settings -> Privacy your used site data number increasing over time
Clear all cache, confirm in about:cache, restart Firefox
Observe in Settings -> Privacy the storage amount does not decrease as expected

Technical Details

In your local Firefox profile there exists the storage/... tree where each subdir is a named site (such as lemm.ee or lemmy.world, etc.) and several subdirs under that, with "default" being the one used in my observation. A quick look at the files sitting there reveals a lot of objects like this one:

$ find ./storage/default/https+++lemm.ee/ -type f
./storage/default/https+++lemm.ee/cache/caches.sqlite
...
./storage/default/https+++lemm.ee/cache/morgue/57/{dcc46c14-70ec-4c6a-8537-1bee4688d239}.final

->

$ file ./storage/default/https+++lemm.ee/cache/morgue/57/{dcc46c14-70ec-4c6a-8537-1bee4688d239}.final
./storage/default/https+++lemm.ee/cache/morgue/57/{dcc46c14-70ec-4c6a-8537-1bee4688d239}.final: snappy framed data

"snappy" is a compression format used by Firefox; a quick install of snappy-tools let me have a look at a copy of one of them, confirming it was a JPEG I had looked at moments ago (note I had to use the -i flag to ignore some checksum errors from rudimentary debugging):

$ ./snappy-tools/unsnappy -i test.final > new.file
unsnappy: test.final: chunk of length 65312: checksum 0x6C991DCD != 0xD86CB7E2
unsnappy: test.final: chunk of length 65546: checksum 0x82701F3A != 0xC295B675
unsnappy: test.final: chunk of length 25832: checksum 0x641E20EA != 0xE0E7B4C5

That said, it pulled right up in an image viewer as the JPEG. I then shut down Firefox, copied that same sqlite3 file to a test file (it's locked with WAL/SHM files while Firefox is running) and verified the chain of custody on that file after a quick look at the schema and following my nose:

$ sqlite3 test.sqlite
sqlite> select request_url_no_query,response_body_id,response_principal_info from entries where response_body_id like '%d239%';
https://lemmy.ca/pictrs/image/2d5b64a8-f43d-4371-a15c-59654555b6ff.jpeg|{dcc46c14-70ec-4c6a-8537-1bee4688d239}|https://lemm.ee/service-worker.js

It is here I must stop and see what your team thinks, as I do see what's happening logistically but am unsure what the expectation of either Firefox or Lemmy UI is to be; as an end user this "feels like" permanent local storage is being used/triggered by service-worker.js instead of it actually going to cache (the cache2/ subfolder, e.g. about:cache view).

I believe it a reasonable expectation as an end user that these images I'm looking at by the dozens if not hundreds are temporary, transient and should be in the cached data (as per about:cache etc.) and not being stored in site local storage which appears to be "forever" (requires manual Forget of the site to clear).

Thank you!

Lemmy Instance Version

0.19.5

Lemmy Instance URL

https://lemm.ee/

dessalines commented 18 hours ago

I'd need someone with some expertise on why the service worker would make removing the cached images impossible.

SleeplessOne1917 commented 10 hours ago

While I'm not particularly skilled with service workers, we have a dependency that should handle setting most of that stuff up automatically. From the webpack config:

  const clientConfig = {
    // Some stuff not relevant here...
    plugins: [
      ...base.plugins,
      new ServiceWorkerPlugin({
        enableInDevelopment: mode !== "development", // this may seem counterintuitive, but it is correct
        workbox: {
          cacheId: "lemmy",
          include: [/(assets|styles|js)\/.+\..+$/g],
          inlineWorkboxRuntime: true,
          runtimeCaching: [
            {
              urlPattern: ({
                sameOrigin,
                url: { pathname, host },
                request: { method },
              }) =>
                (sameOrigin || host.includes("localhost")) &&
                (!(
                  pathname.includes("pictrs") || pathname.includes("static")
                ) ||
                  method === "POST"),
              handler: "NetworkFirst",
              options: {
                cacheName: "instance-cache",
              },
            },
            {
              urlPattern: ({ url: { pathname, host }, sameOrigin }) =>
                (sameOrigin || host.includes("localhost")) &&
                pathname.includes("static"),
              handler: mode === "development" ? "NetworkFirst" : "CacheFirst",
              options: {
                cacheName: "static-cache",
                expiration: {
                  maxAgeSeconds: 60 * 60 * 24,
                },
              },
            },
            {
              urlPattern: ({ url: { pathname }, request: { method } }) =>
                pathname.includes("pictrs") && method === "GET",
              handler: "StaleWhileRevalidate",
              options: {
                cacheName: "image-cache",
                expiration: {
                  maxAgeSeconds: 60 * 60 * 24,
                },
              },
            },
          ],
        },
      }),
    ],
  };

It's been awhile since I've touched this stuff, so I don't remember much of the context behind the caching decisions. Please feel free to criticize the config.

tepozoa commented 3 minutes ago

Minor Firefox debugging info: within the caches.sqlite we have a cache_id field which shows a mapping to the webpack-config code above; in my case ID 4 is the image-cache matching /pictrs URLs. There's then a table request_url which maps a given URL to a numerical ID, then a table response_headers which uses that ID to keep track of a date field (and all other headers, like cache-control etc.). The ID assigned to a URL is just an increasing integer, so in theory the lower the number the older the cache entry.

As I'd flushed my full storage as part of initial debugging, I'm now letting it fill back up again for a few days to try and capture a new caches.sqlite which I might be able to expose stale entries beyond the expected 86400 setting for it to expire. I ran service-worker.js through a code beautifier and it sure looks like it's trying to expire entries. :-/

LemmyNet / lemmy-ui