Optum / dce

Disposable Cloud Environment
Apache License 2.0
320 stars 84 forks source link

Uninstall DCE #238

Open michaelpetersubc opened 4 years ago

michaelpetersubc commented 4 years ago

Is your feature request related to a problem? Please describe. To clean up after a bug I need to recompile the dce software. It appears that to get things right I would have to redeploy (to change scripts running on aws).

As far as I can see, deploying again recreates everything. For example, the api gateway would change.

While I am testing I would just like to remove everything that was installed by the first deploy, then deploy it again with the new software. Even a list in the docs that describes what to remove manually would help. More generally a suggestion about how to go about repairing bugs or upgrading the software would be nice.

However in production, all the users of the service would seem to have to adjust their settings (the gateway url for example) to cope with an upgrade. So a feature that updates the software without changing basic setting or duplicating scripts at aws would help.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

nathanagood commented 4 years ago

Hi, @michaelpetersubc!

As far as I can see, deploying again recreates everything. For example, the api gateway would change.

Could you please provide the steps you used to deploy DCE? There are ways to make updates in place and even tear down (destroy) what you created, but the exact steps depend on how you deployed DCE.

Best regards, @nathanagood

michaelpetersubc commented 4 years ago

Thanks @nathanagood If I am understanding the question correctly, I followed the quickstart.
dce init followed by dce deploy. After that I copied the gateway url to .dce.yaml, set up the child account, created a dce account using the child account.

My current plan is to try to manually remove everything at aws (except the child account), recompile dce, then start again. If there is a better way to set it up in order to make updates in place, I'll do that.

nathanagood commented 4 years ago

Hi, @michaelpetersubc. Un-deployment is a limitation in the versions of the dce CLI v0.3.1 and below. We have a solution for destroying resources deployed by the newer versions of dce in the process of being released.

In the meantime, we are putting a solution together to un-deploy the DCE resources created by these earlier versions. We will share that with you as soon as we have it.

michaelpetersubc commented 4 years ago

Thanks @nathanagood, that answers my question. I'll watch for the un-deploy feature, in the meantime, I'll try to resolve it manually, might learn something.

joshmarsh commented 4 years ago

Hi @michaelpetersubc, we’ve put together an example script for deleting resources from older versions of DCE. It will delete any resources tagged AppName=DCE or with identifiers containing the unique namespace that DCE attaches to everything it deploys. Here are the steps for using it:

Running bulk delete operations against an AWS account is risky, particularly when you have other things in the account that you don’t want deleted. We recommend reading through this script or using it as a guide if you have concerns about accidentally deleting other resources in your account.

As @nathanagood mentioned, dce-cli version v0.4.0 supports deleting dce via locally cached terraform state file, binary, and backend configuration. Here’s how it’s done:

  # change into the directory containing the terraform binary dce-cli used for deployment
  cd ~/.dce/.cache/terraform/0.12.18
  # inititalize terraform using the cached main.tf
  ./terraform init ~/.dce/.cache/module
  # run terraform destroy using the cached main.tf
  ./terraform destroy ~/.dce/.cache/module

We recommend deleting the ~/.dce configuration directory and starting over from dce init if you would like to redeploy dce after destroying it via this method.

We haven’t created a cli command to make this convenient yet. Our goal is to provide convenient mechanisms for deploying, upgrading, and deleting dce. This is an iteration towards that goal, and feedback such as yours helps tremendously in guiding our design. Please let us know if you need any more help.

michaelpetersubc commented 4 years ago

@joshmarsh , @nathanagood Thanks, the script for the older versions works well, seems to have cleaned up everything, including a botched installation that I had partially cleaned up. Version 0.4.0 works as advertised, now to try it for real. The application is for managing grad students computations. Your documentation is now slightly inconsistent with the new version. For example, the .dce.yaml file is no longer used and the deploy script sets the api gateway url automatically.

joshmarsh commented 4 years ago

@joshmarsh , @nathanagood Thanks, the script for the older versions works well, seems to have cleaned up everything, including a botched installation that I had partially cleaned up. Version 0.4.0 works as advertised, now to try it for real. The application is for managing grad students computations. Your documentation is now slightly inconsistent with the new version. For example, the .dce.yaml file is no longer used and the deploy script sets the api gateway url automatically.

Thanks for the feedback @michaelpetersubc. Looks like we missed a few places when we updated the docs last. We'll get on that soon.

michaelpetersubc commented 4 years ago

@joshmarsh , @nathanagood I have to step back one, it appears the default setting for aws-nuke is dry run, so when a lease ends the account is not cleared

020/02/12 17:23:46 INFO: Nuke is set in Dry Run mode and will not remove any resources and cannot set back the state of the DCE child account Please set 'RESET_NUKE_DRY_RUN' to not 'true' to exit Dry Run mode.

which isn't as advertised (the docs say you can reset it to dry run using terraform). The error message gives a sort of sensible fix, but honestly I can't figure out how to implement the fix. I deployed with dce not terraform. Is there a way I can manually reset dry run without a redeploy?

eschwartz commented 4 years ago

Hi @michaelpetersubc -- just want to let you know that I'm looking into this. I should have some useful info for you later today.

eschwartz commented 4 years ago

Ok @michaelpetersubc, I think I can help you manually reconfigure your DCE deployment to enable aws-nuke to run in --no-dry-run mode.

  1. Login to your AWS web console
  2. Navigate to Services > CodeBuild > Build projects
  3. Select the account-reset-<namespace> project, where <namespace> is the namespace you used to deploy DCE (or a random ID)
  4. Select the Edit dropdown, then select Environment
  5. Expand the Additional Configuration section, and scroll down to the Environment Variables subsection
  6. You should see a env var configured for with RESET_NUKE_TOGGLE = false. Change this value to true.
  7. Click Update environment

Subsequent account reset jobs should run aws-nuke in --no-dry-run mode.

Please let me know if you run into any problems with this, or if it doesn't work as expected.


For added context, DCE v0.24.0 introduced a change to enable aws-nuke to run in --no-dry-run mode by default. The latest version of dce-cli is still tied to DCE v0.23.0, so it does not include this change.

We are working on a new release of dce-cli, to upgrade to the latest version of DCE. We also have plans to support additional deployment options, to make it easier to configure these types of parameters.

michaelpetersubc commented 4 years ago

@eschwartz Thanks for the very clear instruction, that worked, and I would never have guessed how to do that. The reset now works and removes everything created while the lease is being used.

However, the reset also removes the admin permission from the trusted role as it did before, so the child account is never returned to the ready state. This is the same problem that started all this - maybe that problem wasn't fixed initially - or at least I haven't managed to install the right version of the updated software. I believe that problem is an issue with dce not with dce-cli which is what I updated.

eschwartz commented 4 years ago

Hey @michaelpetersubc -- I apologize, I didn't realize this was still related to #231, but I see that now.

It looks like we got a fix out for #231 (PR #232), but my guess would be that dce-cli is still using a version of DCE that's older than that release.

Give me a minute to look into exactly what's happening, and I'll get back to you.

eschwartz commented 4 years ago

Seems like the most straightforward solution here is to get a new release of dce-cli out the door. We'll be working on that this week, and we'll keep you updated.

eschwartz commented 4 years ago

Hey @michaelpetersubc I'm actively working on this dce-cli upgrade. I'm also looking into making the dce version used by dce system deploy configurable, so this type of "patching" is easier in the future.

Just so we're not a blocker for you, I want to point out that we do have an alternate path for deploying DCE without using the dce cli: https://dce.readthedocs.io/en/latest/terraform.html

eschwartz commented 4 years ago

Hey @michaelpetersubc wondering if you're still waiting on this. I apologize that we've been spread a little thin lately, and haven't been as responsive as I'd like to be.

I am actually still working getting the CLI upgrade out. We've had a couple major blockers since we last talked, that held this up.

But I might actually steer you in another direction -- take a look at deploying DCE with terraform directly. It will give you more control over you environment, which you'll likely need eventually anyways:

https://dce.readthedocs.io/en/latest/terraform.html

michaelpetersubc commented 4 years ago

@eschwartz Thanks. You are a health company, I did guess you might have higher priority things to do. A have figured out a partial substitute for dce using terraform. I also realized that your suggestion is likely right, do it with terraform. The documentation for terraform wasn't quite as straightforward as it was for the CLI, so I put it off (probably for the same kind of reasons you did) until I get more experience with terraform.

You probably appreciate the feedback anyway, so I'll get back to you with whatever problems I find. Could you leave this thread open so I can reference it later when I need it?