GoogleCloudPlatform / pubsec-declarative-toolkit

The GCP PubSec Declarative Toolkit is a collection of declarative solutions to help you on your Journey to Google Cloud. Solutions are designed using Config Connector and deployed using Config Controller.
Apache License 2.0
30 stars 27 forks source link

#739 - Fixed commands used landingzonev2 readme #740

Closed jacyang2010 closed 5 months ago

jacyang2010 commented 7 months ago

This PR is to fix the two kubectl delete commands from the below link. https://github.com/GoogleCloudPlatform/pubsec-declarative-toolkit/blob/main/docs/landing-zone-v2/README.md#clean-up

See #739

obriensystems commented 7 months ago

FYI, previous PR context where the delete work was spawned out of https://github.com/GoogleCloudPlatform/pubsec-declarative-toolkit/pull/722

Alain, good point on the order of deletion, i rarely had a chance to delete the full lz lately - to sacrifice the cluster. normally i use kpt to recycle but in my last corruption of the lz running hub-env on top of the clz package had issues with remaining services - using kubectl describe was too late. I dont think i started with the config-controller ns

but i will retest on a clean org and update the docs and script with config-control last - good point

https://github.com/GoogleCloudPlatform/pubsec-declarative-toolkit/wiki/DevOps#delete-the-landing-zone

https://github.com/GoogleCloudPlatform/pubsec-declarative-toolkit/issues/593

jacyang2010 commented 7 months ago

The below shows the devtest results before and after fixing

Before Fixing

To conclude based on the above facts, the given command does not remove all gcp resources.

After Fixing

Run the below kpt command to destroy all deployed gcp resources of a given solution. (If you have just made a deployment with solutions, you should still have the local bootstrapping folders for each of deployed solutions, otherwise, you should checkout those folders out from your git repository if you have pushed the bootstrapped solutions to a git repository.)

# kpt live destroy <solution_folder_path>
kpt live destroy core-landing-zone

You can see the above command is deleting a lot of gcp resources as shown below.

image

Once it completed, run the below command to get all gcp resources and you will see no any gcp resource found as shown below.

kubectl get gcp -A
image

To conclude based on the above devtest results, the above kpt approach works well as expected as for making a full cleanup.

fmichaelobrien commented 7 months ago

Good point, I would stick to kpt - the readme just happens to have the lower kubectl delete - hence why I raised #593 Oct 22nd

kpt live destroy is recommended over going lower in the stack with kubectl The issue with kubectl is services are removed at the namespace level across packages. if you have core-landing-zone and hub-env deployed for example it would be better to go higher in the stack and kpt live destroy each in reverse sequence

This is what my issue id on deletion mentions to do in 593 - don't use kubectl delete "Automation: deletion of the landing zone should include the 5 ns - policies, logging, networking, projects, hierarchy - or let the config controller handle deletion via kpt live destroy" https://github.com/GoogleCloudPlatform/pubsec-declarative-toolkit/issues/593

kpt live destroy $REL_SUB_PACKAGE

Check the in-progress LZ automation script (I didn't have a problem with liens last time) https://github.com/GoogleCloudPlatform/pubsec-declarative-toolkit/blob/gh446-hub/solutions/setup.sh#L488

if [[ "$REMOVE_LZ" != false ]]; then
  echo "deleting lz on ${CLUSTER} in region ${REGION}"
  #kubectl get gcp
  # stay in current dir
  # will take up to 15-45 min and may hang unless liens are removed
  # 3 problematic projects
  #gcloud config set project audit-prj-id-oldv1
  #AUDIT_LIEN=$(gcloud alpha resource-manager liens list)
  #gcloud alpha resource-manager liens delete $AUDIT_LIEN

  #gcloud config set project net-host-prj-prod-oldv1
  #PROD_LIEN=$(gcloud alpha resource-manager liens list)
  #gcloud alpha resource-manager liens delete $PROD_LIEN

  #gcloud config set project net-host-prj-nonprod-oldv1
  #NONPROD_LIEN=$(gcloud alpha resource-manager liens list)
  #gcloud alpha resource-manager liens delete $NONPROD_LIEN

  echo "moving to folder ../../../$KPT_FOLDER_NAME"
  cd ../../../kpt
  #cd $KPT_FOLDER_NAME

  REL_SUB_PACKAGE="core-landing-zone"
  echo "deleting REL_SUB_PACKAGE: $REL_SUB_PACKAGE"
  kpt live destroy $REL_SUB_PACKAGE
  # all packages delete
  #kubectl delete gcp --all

This is what the faq mentions https://github.com/GoogleCloudPlatform/pubsec-declarative-toolkit/wiki/DevOps#scenarios

kpt live destroy core-landing-zone

I added 593 in a comment above 5 days ago https://github.com/GoogleCloudPlatform/pubsec-declarative-toolkit/pull/740#issuecomment-1833666699

jacyang2010 commented 7 months ago

Jackson, very nice reproduction and options for adjustment around either recommended kpt live destroy - or reverse aligned lower level kubectl delete - in sequence.

When the patch is finalized I will approve your changes

Resolved with only keeping the ktp approach as suggested by you, and please continue to review it. @fmichaelobrien

jacyang2010 commented 7 months ago

All good - just the anthos keyword back and the pr is ready

Resolved.

jacyang2010 commented 7 months ago

@stanimprover do you mind review and approve this PR?

jacyang2010 commented 7 months ago

Looks good. I thought we would leave the "kubectl delete gcp -A" in its place incase of any underlying resources that need to be clean up. Overall looks good and I would add on the comment above the "gloud anthos config controller delete $CLUSTER --location $REGION", CLUSTER NAME without the [krmapihost-]. Thanks

Hey @stanimprover ,

As for the suggested kubectl-based low level deletion approach, we have made a consent to no provide such a way, not even mention the command "kubectl delete gcp -A" is invalid as shown below.

jackson_yang@cloudshell:~/workspace/company-pbmm-landingzone (single-kcc-yjs06)$ kubectl delete gcp -A
error: resource(s) were provided, but no name was specified

As for the prompt about cluster name prefix, when you run the list command, you will have the below result.

jackson_yang@cloudshell:~/workspace/company-pbmm-landingzone (single-kcc-yjs06)$ gcloud anthos config controller list
NAME: single-kcc-clz-yjs06
LOCATION: northamerica-northeast1
STATE: RUNNING

You can see there is NO any ambiguity on cluster name.

stanimprover commented 7 months ago

looks good.

obriensystems commented 7 months ago

guys, lets check permissions tomorrow and verify that review +1s are avaulable, so far inly one +1 is in.

we should be able to fix this so you can review each others PRs

fmichaelobrien commented 5 months ago

Merge #738 or #739 in sequence - check merge conflict

davelanglois-ssc commented 5 months ago

lgtm....good job