[ENH] - Deleting a non-empty namespace should fail, unless a `force: true` flag is given

peytondmurray commented 4 months ago

Feature description

Currently, it's possible for an admin user to delete namespaces with any number of environments inside them. When that happens, the environments are all deleted as well. This "dangerous-by-default" behavior is probably not what we want. I propose rejecting namespace deletion requests by default, unless

The namespace to be deleted is empty, or
A force: true flag is passed in the request to the conda-store server

Value and/or benefit

This should make it harder for admin users to inadvertently delete an entire namespace's worth of environments in one simple request.

Anything else?

No response

nkaretnikov commented 4 months ago

I don't like the force: true idea. Because it adds nothing compared to a scenario where you send a DELETE request by mistake. I'm talking about the API calls, not the UI.

I think something like this would be better:

client -> server: request to mark to delete
server -> client: confirmation reply to ack the request (maybe with some random data that you need to pass on the second call)
client -> server: actual delete request.

UPD: Because this is now stateful, there are some things to consider:

What should happen if the first client call arrives but the second not? I think the server should drop this after, say, 5 minutes. Otherwise, you can delete by mistake later without confirmation.

nkaretnikov commented 4 months ago

Also, if we do this, I think even an empty namespace must require force because it might have important user settings saved as part of it (e.g., which artifacts are being built). You don't want to erase those by mistake.

peytondmurray commented 4 months ago

I don't like the force: true idea. Because it adds nothing compared to a scenario where you send a DELETE request by mistake.

I don't mind your suggested alternative, but for my own understanding how does requiring a force: true if the namespace isn't empty not help? At the very least it would require admins to consult the API docs to understand the implications of what they're asking. It would also prevent a typo (e.g. DELETE instead of GET) from destroying an entire namespace worth of envs.

nkaretnikov commented 4 months ago

Yeah, I suggest we wait and discuss this during our team meeting next week. It really depends on what we're trying to protect against. E.g., you can also easily mistype with force present (because you have it saved after executing some delete command that you actually wanted to run), but instead of mistyping GET -> DELETE, you mistype the namespace name.

nkaretnikov commented 3 months ago

We've discussed this today during the meeting.

The general plan that everyone seems to be happy with:

have a property on an env/namespace, like "protected"
DELETE fails with an HTTP status code if protected=True
have a PUT endpoint that can change protected, then DELETE will succeed
have force on DELETE to bypass "protected"
UI should just do PUT, then DELETE (and have a confirmation dialog).

gabalafou commented 3 months ago

Is there some lightweight action the server could take on deletion to help the user recover, if the user changes their mind afterwards?

nkaretnikov commented 3 months ago

Is there some lightweight action the server could take on deletion to help the user recover, if the user changes their mind afterwards?

Nope, we need to scope this separately. A proper way to backup/restore was briefly discussed during one of the previous meetings. I agree this would be a desirable feature to have.

peytondmurray commented 3 months ago

The general plan that everyone seems to be happy with:

I'm not the biggest fan of carrying around an extra property on the namespace. I can see that it makes it harder to delete non-empty namespaces, but there are also side effects of introducing a new property - for example, we might need to do additional work on the UI side of things to identify namespaces as protected. "Protected" namespaces are another concept for the users to need to understand and for us to document as well. If the motivation for a call-and-response pattern is safety, then is this actually safer than just requiring namespaces to be empty before they can be deleted? This was another alternative suggested by @dcmcand.

If the group consensus is that this is the best pattern though, I'm on board.

gabalafou commented 3 months ago

I agree with Peyton that it's best to keep our models as minimal as humanly possible.

There are two time frames we can think about to help us move forward.

Short term

We want to prevent a destructive action without guardrails. It seems to me that the path of least resistance here is to pattern namespace deletion on the rmdir command, preventing deletion of a namespace if it has any environments inside of it, as Peyton and Chuck have mentioned. This would require the user/client to delete each environment before deleting the namespace.

Long term

If we want to support deleting a namespace and all of its environments in one go, then I think we should have some kind of recovery mechanism in place. It doesn't need to be great or super user-friendly, but recovery should be possible and not too painful. Once we have some form of recovery in place, then we can scope how we want the API to support a user that wants to delete a namespace with environments.

peytondmurray commented 3 months ago

TLDR there are two proposals to fix this:

protected attribute + call-and-response pattern: a. Add a protected attribute to namespaces, set True by default. b. If protected == True, a DELETE request fails. c. If protected == False, a DELETE request will initiate a confirmation call and response pattern - "are you sure?". If a subsequent "Yes" response is sent to the user, the namespace and all environments inside are deleted. d. The protected attribute can be set with a PUT request.
Require namespaces to be empty before they can be deleted.

conda-incubator / conda-store