datamade / how-to

📚 Doing all sorts of things, the DataMade way
MIT License
80 stars 12 forks source link

Improve Wagtail caching strategy #299

Closed smcalilly closed 1 year ago

smcalilly commented 1 year ago

Background

Django's server-side caching doesn't work with pages and content generated by Wagtail. In the past, we've turned site-level caching on and users weren't able to see the changes they made in the CMS, until after the cache expired.

This issue looks related and might be a possible solution. We'll also need to dive deeper into the Wagtail docs to figure this out.

Proposal

We need to come up with a good way to manage caching when a site uses content generated in Wagtail.

Deliverables

Using either an existing application or setup a very basic website with content generated by Wagtail, we'll have an application deployed on Heroku that caches the website and invalidates the cache whenever a user adds/edits/deletes content within Wagtail.

Timeline

This should take an investment day.

hancush commented 1 year ago

Wonder about something like this https://docs.coderedcorp.com/wagtail-cache/

smcalilly commented 1 year ago

I've spent some time today working on this. The wagtail-cache library that @hancush shared is quite easy to add to an app. I added it to the IL NWSS app. My next step is to benchmark it and see if it's actually helping / dig into the database and make sure it's caching. I've gotten sidetracked with improving overall performance for the application — there's a lot to improve there and some more investigating and such. I'm starting to wonder if a CDN would be a good caching tool for us.

smcalilly commented 1 year ago

I've got it setup here: https://github.com/datamade/il-nwss-dashboard/pull/157

But I can't confirm that it's caching. If you look at the headers, there is "X-Wagtail-Cache": "skip". You can also see in the request logs where the "Transferred" column has 8.77kb instead of cached.

Screen Shot 2023-01-26 at 2 52 52 PM

This suggests that it's not being cached, despite me following the docs to set it up. Looking at the code and I think this is where this "skip" header is getting added, so I'm trying to figure out why.

smcalilly commented 1 year ago

It's been working the whole time. I found out the problem from this comment in a wagtail-cache github issue

The caching by default will be bypassed for logged in users. You can confirm if the page is being cached ("hit"), intentionally skipped ("skip"), or was not in the cache ("miss") by looking at the X-Wagtail-Cache header in the HTTP request (use F12 browser dev tools).

smcalilly commented 1 year ago

I think wagtail-cache is a good library to use. It's easy and well-documented (excepted for the above comment where I learned that caching is disabled for logged in users).

Following our guidelines for using a new tool, I'm going to pilot use of the tool on a project. I think IL NWSS would be a good candidate for this since we're picking development back up and we're using wagtail with some thorny fragment caching.

I can spend investment time adding it to the repo that way we don't burn any of the project hours. I'll have to refactor some of the caching code that we added. I'll keep an eye on the application and sentry errors, but I don't think it will cause any major problems. The worst problem it might cause: an admin user will try to change content and the cache won't clear for a time period (which is highly unlikely since this tool's primary function is clearing the cache of CMS content). Contingency plan is to rollback the changes to the commit before adding it.

@derekeder @hancush do y'all have any problems with this?

derekeder commented 1 year ago

@smcalilly sounds great to me

smcalilly commented 1 year ago

We set this up in the IL NWSS project. See PR: https://github.com/datamade/il-nwss-dashboard/pull/157

@smcalilly write up proposal and note the wagtail vs django caching.

smcalilly commented 1 year ago

4. Recommend adoption, further research, or abandonment

TLDR: I recommend this tool. You easily install it and you'll have caching that is invalidated whenever a user changes content on a page. It has high benefit and low cost.

Cost

There is a low cost to implementation. It's very easy to plugin to a new or existing Wagtail page model. It's decently documented, simple, lightweight, and seemly low maintenance. We can add it to the Wagtail cookiecutter setup, though this might abstract some knowledge away from developers who won't initially have the background to debug any issues that might come up.

The one thing to note: this tool should be used in tandem with Django caching. This can get confusing if you're unfamiliar with either tools or the code base. In our pilot project, we had a Django model that users are managing in the Wagtail CMS. The wagtail-cache library does not work for Django models. This was unclear at first for code review and testing.

Also re: testing, the cache is essentially turned off for logged in users. I spun my wheels on this one when I was testing it out, trying to see if it was working. I noted in a comment in this thread about that.

Regarding maintenance, this library would be very easy to remove from a project if we ever needed to.

Benefit

We've never had a way to cache Wagtail-generated content as far as I know. We can either have no caching for Wagtail pages (bad) or tell a user that the cache will expire after X minutes (also bad).

Alternatives

In the future, we might want to use a CDN for even better caching, but I think wagtail-cache + the Django cache work good enough for now. This is something that could be reconsidered per-project, like a project where you might need the best caching available (in which case we consider a CDN).

TODO:

We need to produce an artifact for this. We either need to create some documentation about how to set this up, or bake it into the Wagtail cookiecutter. @derekeder which artifact option should we do?

derekeder commented 1 year ago

@smcalilly I think having documentation for this would be good to do first and then follow-up with the cookiecutter. There are some edge cases, like editing non Page models, what we should write a bit about.