directus / directus

The flexible backend for all your projects 🐰 Turn your DB into a headless CMS, admin panels, or apps with a custom UI, instant APIs, auth & more.
https://directus.io
Other
28.09k stars 3.91k forks source link

Collection/field creation/changes saved to DB but not displayed in Directus in v9.23.2–v9.24.0 (self-hosted Docker) #18110

Open rowild opened 1 year ago

rowild commented 1 year ago

Describe the Bug

After running docker compose pull [service] and restarting the app (docker compose down and docker compose up -d, all done on DigitalOcean with SQLite) fields of a collection are not updated anymore.

https://user-images.githubusercontent.com/213803/230712051-ae5dfe3d-4ec8-44b0-8e2a-c324249ae49b.mp4

To Reproduce

(The content has not been updated.)

Hosting Strategy

Self-Hosted (Docker Image)

rowild commented 1 year ago

Further things I did:

I eventually stopped docker, deleted all containers and images and ran docker compose again, but even that didn't change the problem.

Eventually I set it up locally using the exact same docker-compose.yaml that I use on directus:

version: '3'
services:
  cache_jvds:
    container_name: cache_jvds
    image: redis:6
    networks:
      - jvds
  directus_jvds:
    container_name: directus_jvds
    image: directus/directus:latest
    ports:
      - 8056:8055 # external:internal
    volumes:
      - ./uploads:/directus/uploads
      - ./database:/directus/database
      - ./extensions:/directus/extensions
    networks:
      - jvds
    depends_on:
      - cache_jvds
    environment:
      KEY: '...'
      SECRET: '...'

      DB_CLIENT: 'sqlite3'
      DB_FILENAME: './database/data-jvds.db'

      CACHE_ENABLED: 'true'
      CACHE_STORE: 'redis'
      CACHE_REDIS: 'redis://cache_jvds:6379'

      ADMIN_EMAIL: '...'
      ADMIN_PASSWORD: '...'

      PUBLIC_URL: '...'

      STORAGE_LOCATIONS: 'local'
      STORAGE_LOCAL_DRIVER: 'local'
      STORAGE_LOCAL_ROOT: './uploads'

networks:
  jvds:
    name: jvds
    external: true

Even here no collection that has been created will be shown even though the data was saved to the database (see video).

https://user-images.githubusercontent.com/213803/230759916-63652675-4202-42d3-8afe-e47385e03dcd.mp4

UPDATE 1: I also tried a local instance using Postgres - the same thing happens. Reverting back to 9.23.4 DOES show the collection.

UPDATE 2: I am experiencing this problem with 9.23.4, 9.23.3 and 9.23.2 as well. Only in 9.23.1 collection creation works as expected.

br41nslug commented 1 year ago

I am unfortunately unable to reproduce this 😬 Could you try reproducing it with all caches disabled "CACHE_ENABLED"/"CACHE_SCHEMA"/"CACHE_PERMISSIONS" and inspect the actual API call done when trying to apply changes?

rijkvanzanten commented 1 year ago

We haven't been able to reproduce this based on the given information, so I'll close this for now. Happy to keep discussing in this thread, and we'll reopen once more information becomes available 👍🏻

rowild commented 1 year ago

Thank you for your feedbacks, @br41nslug & @rijkvanzanten! I did what you told me – I am sorry, I was not aware of the CACHE parameter options! Here is what I did:

I tried a new local installation with a new database and that one works indeed fine!

But as soon as I use my existing SQLite file – by replacing the just created, empty new data.db file – the problem as described above remains.

So next I execute docker compose down, delete all containers and images and make sure to set those CACHE* options you recommend to false in my docker-compose.yml. Eventually docker compose up -d creates a Directus 9.24 instance with my old database file (the one with all the data), on which I can work now (change fields, create new collections, ...).

Buuuuut ... when I then go back to the yml file and enable all cache options again (and do all the docker compose down and up stuff), I am again facing the same problem as initially described.

I also experimented with CACHE_AUTO_PURGE, but to no avail. I assume changing KEY and SECRET would be counter-productive?

This happens locally as well as on DigitalOcean. And I should mention that I only tried the above things with SQLite, not any other db.

Do you happen to have any other ideas? I'd be happy to provide my database file, if you think that could help. Or my DigitalOcean account.

(I remember that I created my data with Directus 9.11 (or 15?), locally (and not with Docker, but going with the Node version). Then, after 5, 6 months, I tried DigitalOcean, and as soon as the setup worked there, I copied over that old database file. I had to do some work like changing the IDs of the user groups etc, but I eventually could work with it. And I still can do so with v9.23.1, but not any version after that...)

Thank you again for your time!

rowild commented 1 year ago

I think I found the problem. As soon as I add "PUBLIC_URL: 'https://localhost:8057'" (port matches the port mapping 8057:8055) the infos are still written to the database, but Directus does not display that info anymore. However, it still recognises that a collection already exists should you try to create one that has the same name.

Steps to reproduce:

STEP 1: setup

You can use this file:

version: '3'
services:

  cache:
    container_name: cache
    image: redis:6
    networks:
      - directus

  directus:
    container_name: directus
    image: directus/directus:latest
    ports:
      - 8055:8055
    volumes:
      - ./uploads:/directus/uploads
      - ./database:/directus/database
    networks:
      - directus
    depends_on:
      - cache
    environment:
      KEY: '255d861b-5ea1-5996-9aa3-922530ec40b1'
      SECRET: '6116487b-cda1-52c2-b5b5-c8022c45e263'

      DB_CLIENT: 'sqlite3'
      DB_FILENAME: './database/data.db'

      ADMIN_EMAIL: 'admin@example.com'
      ADMIN_PASSWORD: 'd1r3ctu5'

      CACHE_ENABLED: 'true'
      CACHE_STORE: 'redis'
      CACHE_REDIS: 'redis://cache:6379'

      # PUBLIC_URL: 'https://127.0.0.1:8055'

networks:
  directus:

So far everything should be working. Continue with

STEP 2: break it

STEP 3: downgrade to 9.23.1

Beginning with Directus:9.23.2 these behaviours are broken.

STEP 4: fixing it again

Unfortunately, simply removing the PUBLIC_URL from the yml file won't fix the problem. Not even stopping and deleting any images in order to get rid of the redis cache. It seems that at this point the db file is corrupt. (Sometimes a process is running that quickly creates and deletes a [data.db].db-journaled file – not sure what this does. Only completely deleting any cache and using a previously backupped database file will make the system work again. (Or turning of cache as mentioned earlier.)

Only tested with SQLite in my local environment (MacBook M2).

Here is a video with my workflow with STEP 1 and STEP 2 (sorry, didn't record the downgrade): https://dl.rowild.at/directus-not-saving-collections__2023-04-11.mp4

Is my understanding of PUBLIC_URL wrong?

br41nslug commented 1 year ago

The PUBLIC_URL should have no direct effect on the database itself. However it looks like you're trying to use a https:// public_url directly to the docker without a proxy handling the certificates. Keep it at http:// for local development.

rowild commented 1 year ago

Ok, I will keep that in mind. However, it does not explain why 9.23.1 works and everything else afterwards not. Also, on my DigitalOcean instances Nginx is handling the proxies. And I still have the exact same problem. I am quite surprised that I seem to be the only one who can create this problem :-) Can you reproduce the problem with my explanation from above?

The PUBLIC_URL does not influence the DB, since, on inspecting the DB file, data is saved. But reading from the DB again right after saving seems to be the problem...

(Is there any other way to delete the cache aside from deleting and reinstalling redis?)

br41nslug commented 1 year ago

These 2 steps are not needed after a docker compose down

docker rm [container] # covered by compose down
docker rmi [images]   # only forces a redownload of the latest tagged image, data is not stored here

nor are these flags --force-recreate --build

having said that i went to through the steps as described and unfortunately "Step 2" is not breaking for me. The [data.db].db-journaled you mention makes me think something may have gone wrong at a filesystem/database level for which my tests on another operating system may not be representative of your specific environment A quick google turned up this post to do with the new apple M cpu's https://sqlite.org/forum/forumpost/d2432b5dc2

rowild commented 1 year ago

@br41nslug Thank you for your feedback! Very much appreciated! I will check the link you post and try to dig deeper.

I hope I will then also find the reason why my DigitalOcean setup does not work either (at DO there is no M1/M2 problem, everything is Ubuntu there...)

From what I understand I have to docker rm [container] because compose only stops, but doesn't delete them. They need to be deleted, though, otherwise docker rmi [images] won't work. I do understand that those steps are not needed. Just wanted to be as thorough as possible... Thanks for commenting on them!

I will report again should I find something. Meanwhile thank you very much for your help! :-)

stx-chris commented 1 year ago

We have been experiencing the very same issue since upgrading from 9.23.1 to 9.24. and 9.25.. We use Postgres and Docker and updates suddenly don't get reflected anymore. Sometimes we get a "permission denied" error when refreshing the view or saving an item.

We tried signing out and in again, creating new admin users and various other things like re-deploying with different env settings.

What has worked today was setting all three cache settings, as mentioned above, to false. Only then were we able to perform item updates without issues. As soon as we flip the cache back on, the issue reappears. Is it possible that some cache permissions have changed since 9.23.1? Can we manually invalidate the whole cache somehow?

Many thanks for your help.

br41nslug commented 1 year ago

Does the issue persist when enabling both CACHE_SCHEMA and CACHE_AUTO_PURGE @stx-chris

stx-chris commented 1 year ago

I tried again with possible variants of CACHE_ENABLED, CACHE_PERMISSIONS, CACHE_SCHEMA, CACHE_AUTO_PURGE and found that CACHE_AUTO_PURGE seems to be the culprit, at least when used with CACHE_STORE=memory.

Whenever it is disabled (= default), both the view and open document don't reflect the changed value. When it is enabled, save works as expected and view/item are updated accordingly. Interestingly, it only shows this behavior in Docker, but locally (MacOS) it works either way.

Might it have to do with the recent changes of #17763?

stx-chris commented 1 year ago

@rowild Can you try setting CACHE_AUTO_PURGE to true and confirm whether this (temporarily) solves your problem too?

rowild commented 1 year ago

@stx-chris I currently have these settings, and for the moment they seem to work (Ubuntu 22.04, Docker, DigitalOcean)

  CACHE_ENABLED: 'true'
  CACHE_PERMISSIONS: 'true'
  CACHE_SCHEMA: 'true'
  CACHE_AUTO_PURGE: 'true'
stx-chris commented 1 year ago

Great! Can you check whether the issue reoccurs once you set CACHE_AUTO_PURGE to false? This would then confirm our mutual observations.

rowild commented 1 year ago

@stx-chris After applying the changes to the docker-compose.yml, how do you restart your project? A simple "docker compose restart" does not clear my cache it seems... So I wonder if my previous result is really true.

stx-chris commented 1 year ago

I am deploying to Google Cloud which rebuilds the container every time. For this test I would manually delete the docker image and rebuild.

rowild commented 1 year ago

@stx-chris I deleted all containers and images. Upon "docker compose up -d" my previous test is valid. Doing the whole process again, then setting CACHE_AUTO_PURGE to false causes the problem of not reflecting the changes in the interface again. So yes, it seems to be a CACHE_AUTO_PURGE problem...

stx-chris commented 1 year ago

@br41nslug I guess the issue has been pinpointed then. Let us know if you need more details. Thanks!

rowild commented 1 year ago

@rijkvanzanten Is it possible to re-open this issue, please?

br41nslug commented 1 year ago

The lack of cache clearing on restart is a consequence of https://github.com/directus/directus/pull/18238 this was done deliberately for horizontally scaled setups. After reading back, enabling the CACHE_AUTO_PURGE seems to be the solution. Perhaps CACHE_AUTO_PURGE should be enabled by default with that change 🤔

stx-chris commented 1 year ago

Agreed, but I'm afraid there is still a bug hiding in the code in howCACHE_AUTO_PURGE is treated. Whatever its effect on restart is (whether single instances or horizontally scaled pods are affected), it should not affect the update mechanics once the instance is up and running.

There is no good reason why anybody should wish not to see their recent changes in the UI.

I am also wondering why this issue shows only in Docker environments and not locally. Any ideas?

br41nslug commented 1 year ago

it should not affect the update mechanics once the instance is up and running.

I disagree as that is the purpose of this variable flag as the name may suggest. When disabled the cache does not get cleared automatically.

There is no good reason why anybody should wish not to see their recent changes in the UI.

The main argument for introducing this i believe was platform performance in production instances where the schema does not change anymore.

I am also wondering why this issue shows only in Docker environments and not locally. Any ideas?

This to me sounds more like something is wrong with the cache locally 😬

br41nslug commented 1 year ago

Likely the reason why this behavior suddenly changed on your ends is because you were running without schema cache before. After https://github.com/directus/directus/pull/17763 the schema cache seems to be enabled by default thus the suggested solution of making CACHE_AUTO_PURGE enabled by default returning the "expected" behavior for anyone that was not using cache for schema before.

rowild commented 1 year ago

I used what the documentation suggested. So if the "running without schema cache before" habit changed this should be clearly stated IMO (actually I even would say that this justifies a new version since it is a critical change). "A problem with cache locally": would that also be true for a DigitalOcean installation? (Because I get the error there.)

br41nslug commented 1 year ago

"A problem with cache locally": would that also be true for a DigitalOcean installation? (Because I get the error there.)

This was a reaction to stx which explicitly stated "it is working as expected locally" while i don't think that is the case for your DO setup.

br41nslug commented 1 year ago

I used what the documentation suggested. So if the "running without schema cache before" habit changed this should be clearly stated IMO (actually I even would say that this justifies a new version since it is a critical change).

This is indeed probably an issue of default settings on our end. I'll re-open the ticket to correct these defaults

rowild commented 1 year ago

"A problem with cache locally": would that also be true for a DigitalOcean installation? (Because I get the error there.)

This was a reaction to stx which explicitly stated "it is working as expected locally" while i don't think that is the case for your DO setup.

Ah, ok! – I understood stx in the way that all his Docker installations have problems, but the local one (I assumed a Node installation) does NOT have a problem. Now I am confused, because the latter one works as expected – but you say it might be a problem with local cache...

Anyway: thank you for re-opening the issue and having a closer look at it! Very much appreciated! :-)

rowild commented 1 year ago

@rijkvanzanten and @br41nslug Thank you! 💯 And thank you, @stx-chris, for finding the culprit! 👍

br41nslug commented 1 year ago

@stx-chris @rowild The patch of CACHE_AUTO_PURGE was reverted because of a performance concern. Because enabling that will purge the cache on any collection change including activity/revisions (like logging in and browsing around the data studio).

We have found the underlying culprit which did turn out to be PUBLIC_URL as @rowild mentioned but I incorrectly dismissed 😬 This behavior was changed in https://github.com/directus/directus/pull/17642 resulting in this issue when the PUBLIC_URL is not identical to the url used to approach the app. If these do not match the app will receive cached results instead of bypassing the cache.

The workaround for now is to either set the PUBLIC_URL to the correct URL or if thats not possible remove the PUBLIC_URL instead until we can set up a more robust fix for this issue. We'll leave this ticket open in the meantime.