go-gitea / gitea

Git with a cup of tea! Painless self-hosted all-in-one software development service, including Git hosting, code review, team collaboration, package registry and CI/CD
https://gitea.com
MIT License
45.43k stars 5.52k forks source link

[Feature & Proposal] Maintenance Mode #9618

Open bagasme opened 4 years ago

bagasme commented 4 years ago

We can schedule a maintenance window in case we perform maintenance tasks that requires taking offline Gitea instance (such as upgrading Gitea, migrating data, scaling hardware resources, and troubleshooting system issues).

Feature:

lunny commented 4 years ago

will replace #9577

sapk commented 4 years ago

The subfunction for message/banner is already discussed under #2345

bagasme commented 4 years ago

@sapk so #2345 is dependency of this issue?

justusbunsi commented 3 years ago

Anyone working on this or #2345? I would like to learn Go a bit more and could try to implement this feature. It just might take a while. 😉

sapk commented 3 years ago

I mentioned the issue #2345 to link them and if possible already take in account the broader usage of message broadcast but this can be done in second time if not possible in one time.

@justusbunsi no one opened a PR linked to those two issues so don't hesitate to start. If you have question feel free to come ask them on discord. Since you will need to change things in frontend and backend this guide can be helpful to start: https://docs.gitea.io/en-us/hacking-on-gitea/

justusbunsi commented 3 years ago

After some time diving into the code and inspect the admin area, I would add a maintenance mode section inside the admin dashboard below the actions section. Is this a good place to have this?

Regardless of the configuration location: As proposed above it would allow manually enabling/disabling maintenance immediately, defining a message that will be displayed at the top of each page and specifying a datetime range to schedule the maintenance. Seconds will be trimmed.

What I am currently struggling with is where to store these data.

lunny commented 3 years ago

I think you could create a new table to store the message.

wxiaoguang commented 2 years ago

I think it's not feasible at the moment. (update: it could be done by using designed rules to block all writing requests, only leave a few requests open for admin login/management, but the effort paid for it seems not trivial, while the benefit is trivial)

Gitea is a single-binary app, and no cluster support at the moment. When you are upgrading Gitea, migrating data, scaling hardware resources, and troubleshooting system issues, Gitea itself usually is not running.

You could setup a nginx (or some other proxy) to do the 503 response or show a banner.

lunny commented 2 years ago

feasible

But you can run serval instances and keep at least one running.

wxiaoguang commented 2 years ago

feasible

But you can run serval instances and keep at least one running.

Then, what's the benefit? The only running instance should stop all cron tasks, clear queue, stop all sessions and just tell user a 503 response code? And if you are running several Gitea instances, isn't there a nginx-like proxy before them? then why not just let the nginx return 503 .....

But the effort paid for it seems not trivial, while the benefit is trivial. And the situation might be worse for Windows SQLite users, as long as the database is in use, it could not be backuped.

McNetic commented 2 years ago

I also think the feature proposal is sensible.

m-ueberall commented 2 years ago
  • Secondly, the maintenance mode helps in shutting down the service without interupting running tasks and also provided a possibility to announce the downtime before it actually happens

To extend on the above: A maintenance mode does not (should not) necessarily require shutting down the service (read: terminate all processes, including the main one). And that's exactly what makes a huge difference w.r.t. monitoring/watchdog solutions that are "usually eager to restart one or more instances (in a cluster) if Gitea isn't up and running". If a maintenance mode basically signals "I'm alive, restriction of certain functionality atm is intentional" instead – regardless of what will be restricted –, this does not require additional, solution-specific checks/rules on the outside.

Also, with respect to the third item above (backup purposes), switching one or more (connected) Gitea instances into maintenance mode in a defined manner (e.g. by flushing certain caches and/or waiting for some operations to finish in case of a specified "grace period") allows for external scripts/events to make use of a defined state for a series of actions that aren't/cannot be considered atomic together. One example would be a backup consisting of dumping the database and creating a snapshot of the file system using two different tools (say, zfs [snapshot …] followed by mariabackup [--backup …]) basically at any given point in time after/before toggling maintenance mode–usually for a mere couple of seconds as "post-processing" said backup/copy does not require any restrictions anymore.

silentcodeg commented 2 years ago

Here are links to what prompted to re-open this feature proposal today:

mutech commented 1 year ago

What does gitea do when shutdown while a big commit or some other long running action operating on either the DB or the git repository is ongoing?

I would assume that gitea itself just shuts down and the components (git library, postgres et.al) take care of the details.

That's what I do in my homelab. I just take disk backups while the system is running. While this is a bit careless and harsh, it works surprisingly well. Over the years I only once had an inconsistent backup and I could live with loosing some wiki edits or issue updates.

However, in an environment where a lot of people work concurrently, I don't do that, not wanting to look stupid in front of everybody. But if then gitea does not gracefully stop ongoing processes or wait for their completion, being careful doesn't buy me much. If on the other hand gitea handles this properly, all that needed to be done would be to not accept new connections or api calls with side effects. The hard part seems to be the handling of ongoing operations and that in turn does not feel like an optional feature.

Extending proper handling to a maintenance mode in which interactive users get some nice piece of UI does not seem to be especially hard in comparison to what's needed to defend against data corruption or inconsistent states between DB and repo.

Why is this feature so difficult to provide?

garymoon commented 1 year ago

Not speaking on behalf of the maintains of course, just as an infrastructure engineer.

If you want your application to stay responsive during maintenance, "connection draining" (or similar) is feature you should reasonably expect of your load balancer, not your application. If high availability is necessary, you'll have multiple frontends running, and at restart/replacement time the load balancer will stop sending new requests to the frontend pending restart/replacement, but maintain existing connections until they close or a hard timeout is reached.

For example: Swarm will do this via deploy.update_config, and AWS will do it via ELB connection draining. Unfortunately in Nginx this functionality is gated behind the very expensive Nginx Plus subscription.

If you want a specific error page for a maintenance page style functionality, you'd simply set a custom static page to serve for a 503 error. For example: Nginx will let you set an error page via error_page. The page could be changed for scheduled maintainenance vs and unexpected outage by changing the location of the page and reloading Nginx (systemctl reload nginx or nginx -s reload) which is a downtime-free operation.

lunny commented 1 year ago

When before backuping Gitea instance, Gitea instance should enter a maintenance mode to refuse data changes to reduce data loss.

To implement maintenance mode, a quick method is to can control two layers. One is for database writing, we can do something with a maintenance engine implementation to return error for all write operations. Another is for git command, we can detect all possible write git operations and return error message about the instance is maintaining.