Maintenance mode - Githubissues

torfsen commented 8 years ago

As discussed on the ckan-dev mailing list, it would be nice to have a way to put the site into read-only maintenance mode in which the users can continue to use the site but cannot make any changes.

It seems that a good way for making the site read-only would be to block the writing API requests. However, the maintenance mode also needs to be communicated in the UI, for example using flash messages during login and by showing notes instead of the usual form elements on edit pages.

Obviously maintenance mode should be manageable via API calls (and probably also paster) to allow for automation.

davidread commented 8 years ago

You mentioned the purpose:

This would allow me to safely upgrade CKAN in the background without risking content inconsistencies.

and I think it is worth being really clear about this.

General code and package upgrades to CKAN can be done in seconds and don't need the site taking down at all - just restart apache (or whatever) when you're done.

When you need to reindex SOLR then can simply use -r to keep records. If you have a SOLR schema addition then you might need to complete the reindex before you switch on the new feature that displays the new field.

Database migrations can take time, and meanwhile SOLR can still run and display datasets and do searches, so you could imagine a 'maintenance mode' that takes does db migrations while CKAN still runs. But I imagine there are lots of database calls floating around that need to be suppressed e.g. the logic that checks if you are logged in, or for groups & orgs. It might be easier to have another instance of CKAN running for these situations, but of course that would need to be in 'maintenance mode' too i.e. prevent writes.

It would be good to get your thinking on this.

torfsen commented 8 years ago

Thanks for the additional information, @davidread! I think my main concern are data inconsistencies when I do content snapshots for backup or for data migration between development, test and production servers. For example, I need to be able to create a consistent snapshot of all databases (CKAN, DataPusher, etc.) and the CKAN storage directory on the file system. For this I need to be sure that none of that data is modified while the snapshot is being exported or imported.

Of course I can simply stop the server or switch to a dummy maintenance site during the maintenance (that's what I'm doing now), but that means that the site is not available during that time (including the API, which now throws 404s instead of a more appropriate error). And while you are indeed correct that most of these operations are done quickly there's always the chance that something goes wrong and one needs a bit more time.

A read-only maintenance mode would just be a clean solution to this (admittedly minor) problem.

Thanks for pointing out the potential problems with inconsistencies of database reads (login checks, etc.) during imports/migrations -- I hadn't thought of these, yet.

I think we can distinguish two different issues that one has to handle during maintenance:

Ensure data consistency for exported data. This can be done by disabling all writes to databases and the filesystem (aside from non-persistent data like caches). The major challenge here seems to be a clean communication to the user so that they don't spend time editing a dataset that they then can't submit.
Ensure data consistency during imports/migrations. As you've suggested this can be solved by running a second, read-only CKAN instance on a copy of the data.

This would lead to the following workflow for work on a CKAN instance "A":

Put CKAN "A" into read-only maintenance mode.
Take a snapshot of the data of "A" (now guaranteed to be consistent).
If you want to make changes to the database of "A" then
1. Set up a second CKAN instance ("B") using the data snapshot you just created and reroute the traffic of "A" to "B".
2. Update/migrate the database of "A"
3. Revert traffic back to "A" and disable "B"
Disable maintenance mode on "A"

Conzar commented 7 years ago

I think maintenance mode is necessary when doing upgrades. For instance, moving the base operating system from Ubuntu 12.04 to 14.04. And eventually 14.04 to 16.04 (when that's supported by ckan). I think in many organizations, provisioning a new vm for software upgrades is standard if they use automation technologies like Puppet.

So the old production system would be put into read only mode, a snapshot of the database and assets would be taken, a new vm is provisioned and the database and assets are restored. After this has been completed, the IP is changed to point to the new production.

Aaron-M commented 7 years ago

+1 for maintenance mode as described by @torfsen

pwalsh commented 7 years ago

Shouldn't maintenance mode be done at a layer above the app itself?

Heroku maintenance mode works like this. We also did something similar for CKAN on the internal Open Knowledge cloud setup we have, by having a maintenance switch on our docker images, and using that to switch to maintenance mode.

The problem with maintenance mode in the app itself is, of course, that the app needs to serve the maintenance mode page, which is theoretically fine if read only is the only requirement, but some upgrades require actual downtime of the app itself.

davidread commented 7 years ago

@pwalsh I think you're talking about serving a static 'This site is undergoing maintenance' page. We're talking about a mode where CKAN is still running - normal to most users - except that write operations like edits to datasets are prevented. Or have I misunderstood you?

BTW I think @torfsen makes a good case for this and encourage him (or anyone else who wants it) to go ahead to add it.

For ourselves, I think we're happy for the occasional planned maintenance where we take the site down for an hour or two off-peak.

amercader commented 7 years ago

@TkTech implemented this in https://github.com/open-data/ckan/commit/1ae921c750820e8631718378b7434a2a81bdf4bf and hopefully it will get upstream soon!

TkTech commented 7 years ago

This is basically doing what @torfsen originally suggested, which is blocking the writing API calls (but as a whitelist instead). Actions that do not modify the database can be marked as read_safe. It works well with the existing UI since just about everything in the templates decides to display or not based on check_access.

That commit does not include a flash message, but would be very easy to add. This is because we have handled this in our client-specific extension's template.

Reading the discussion thread here, I think people are trying to make this pony do every trick. There is a place for both read-only mode at the app layer and a simple maintenance page at the CDN/nginx/apache/whatever layer.

For the first case, imagine hosting a government open data portal. You want to upgrade CKAN versions, requiring a dataset schema migration. You do not want to stop serving data to the public for the hours this might take, but you also do not want new datasets to be added since your old and new instance would become out of sync. This is where you would use the app-layer read-only.

For the second case, imagine doing new blocking-index builds, or running table migrations. For this, you would want to do it at the CDN/nginx/apache layer.

daucong commented 2 weeks ago

Quesion in 2016, now Is it have yet?

ckan / ideas

Maintenance mode #171