StackStorm / st2

StackStorm (aka "IFTTT for Ops") is event-driven automation for auto-remediation, incident responses, troubleshooting, deployments, and more for DevOps and SREs. Includes rules engine, workflow, 160 integration packs with 6000+ actions (see https://exchange.stackstorm.org) and ChatOps. Installer at https://docs.stackstorm.com/install/index.html
https://stackstorm.com/
Apache License 2.0
6.07k stars 747 forks source link

API: Rework PR #5304 support for action file deletion. #5348

Closed nzlosh closed 3 years ago

nzlosh commented 3 years ago

SUMMARY

PR #5304 added the ability to delete file content from disk for the action API. By altering the underlying behaviour of the action endpoint, we are breaking the contract with existing St2 installations. People using this endpoint will be confused by the new behaviour and potentially loose unsaved work in worst case scenarios. Beyond API contracts, the code is currently broken in two ways which are detailed in Expected Results

The code does not handle file removal on all nodes in StackStorm's HA installations. This problem is related to the core code being unaware of nodes operating in the HA solution and as such should be considered out of scope for the resolution of this issue. Another issue addressing HA awareness should be opened for a future release.

STACKSTORM VERSION

st2 v3.6dev

OS, environment, install method

N/A

Steps to reproduce the problem

Expected Results

  1. Able to remove actions from database only. (the current StackStorm behaviour with action delete).
  2. Able to remove actions from database and disk. (the proposed change)
  3. Consistent behaviour under fault conditions. I.e. the delete operation must be atomic. If errors are encountered with the database/file, the action is left in a consistent state.

1. Impossible to remove just the database entry

The CLI only allows action delete with file removal.

st2 action delete test.default_values                                                                                                 
The resource files on disk will be deleted. Do you want to continue? (y/n): n                                                                                                
Action is not deleted.                                                            

The API does not offer the means to remove the action only from the database. The PR maintains the API signature of an action_ref and requester_user which makes deleting files on disk a hard requirement for API calls to delete an action. https://github.com/orchestral-st2/st2/blob/e76573c99d5a1d403a5818fa853f0d4a8999fad1/st2api/st2api/controllers/v1/actions.py#L209

Since this changes existing API behaviour to become more destructive, the new implementation should be an explicit opt-in by the user. As suggested in #5304 a global option in st2.conf could be a solution. An API argument to delete disk content allows from more granular control. Perhaps both should be implemented: st2.conf for global policy and an API parameter for use cases where it need to be overridden?

2. Inconsistent state when file access error encountered

As describe here https://github.com/StackStorm/st2/pull/5304#issuecomment-908439505 when a delete operation encounters an error on disk, an http 500 internal error is reported, the database entry is removed but the disk content is left. If errors are encountered during the deletion process, St2 must remain in a consistent state or there will be an administrative burden added to running StackStorm which will degrade user experience.

m4dcoder commented 3 years ago

@mahesh-orch Can you address the issues here? Please prioritize item 2 first and submit PR where there's error when deleting the files (due to permission error) and rollback the operation so the action isn't in an inconsistent state (database entry deleted but files are still on the filesystem). As for item 1, I think needs to finalize what the solution is and can address in a separate PR.

m4dcoder commented 3 years ago

@nzlosh Going thru your feedback from the original PR, I think changing -f/--force to --remove-files per your suggestion will address your concerns here. I don't think a global setting in st2.conf is required. We'll rollback the behavior to what it was but then passing --remove-files will remove the files in the pack.

arm4b commented 3 years ago

Thanks, @nzlosh for raising this and @m4dcoder for the reaction! :heart:

nzlosh commented 3 years ago

@m4dcoder I'd be OK with the --remove-files option to keep backward compatibilty. @cognifloyd and @amanda11 we're discussing the use of st2.conf so I'll let them step in here if they wish to.

amanda11 commented 3 years ago

So with --remove-files are we proposing:

1) st2 delete on cli will by default only delete in db - therefore same as previous st2 version?

2) st2 delete --remove-files will delete db and database? New option for 3.6 so no issue of backwards compatible.

3) new delete from UI will always delete from db and files? (As new to st2 3.5 no backwards compatiblity issues).

Would mean this would fail on UI on k8s - but i presume the create through the workflow composer also currently fails on k8s? And like not using pack install via UI for k8s, is discouraged on k8s deployment.

4) what would be the default through the rest api?

nzlosh commented 3 years ago

Points 1 & 2 are what I had in mind. st2 action delete <pack>.<action> - delete database entry only st2 action delete --remove-files <pack>.<action> - delete database entry and files.

Point 3: There will need to be a way for the user to communicate the expected behaviour when they click the delete button via the WebUI. Perhaps a delete confirmation form with a check box [] remove action file from disk (enabled by default?). I don't do much web UI/UX so this may be a clumsy way of interacting.

Point 4: If a remove-files argument is added to the action delete API which defaults to false, it keeps backwards compatibility with any code calling the API that doesn't specify the remove-files as true and allows the CLI and WebUI to pass through user preference.

As for the failures in an environment that had read only access to the packs, I think getting an appropriate "access denied" message via the API would be the correct response in the case file deletion was requested. If only database deletion was requested, it should respond OK if the database operation succeeded.

m4dcoder commented 3 years ago
  1. We will change UI to prompt user if they want to delete the files as well. If no, then only database entry will be removed.
  2. The additional argument (i.e. {"remove_files": true}) will be passed in the JSON body of the DELETE method.
  3. We can return 403 Forbidden if the files cannot be deleted.
m4dcoder commented 3 years ago

Reopen issue because it was closed automatically when we pushed the partial fix.

m4dcoder commented 3 years ago

The second part of the rework is located at https://github.com/StackStorm/st2/pull/5360.

amanda11 commented 3 years ago

@nzlosh Can this be closed now, after the second PR was merged?

nzlosh commented 3 years ago

Yes, all the issues have been addressed according to the points mentioned here.