cloudfoundry / bosh-cli

BOSH CLI v2+
Apache License 2.0
178 stars 162 forks source link

delete-env fails but says it succeeds #107

Closed shalako closed 6 years ago

shalako commented 7 years ago
$ gosh -e magellan delete-env ~/workspace/bosh-deployment/bosh.yml
Deployment manifest: '/Users/scoen/workspace/bosh-deployment/bosh.yml'
Deployment state: '/Users/scoen/workspace/bosh-deployment/bosh-state.json'
No deployment state file found.

Succeeded

Upon recreating the env, bosh tries to migrate the disk and fails

$ ./deploy.sh
Deployment manifest: '/Users/scoen/workspace/bosh-deployment/bosh.yml'
Deployment state: './state.json'

Started validating
  Downloading release 'bosh'... Finished (00:01:55)
  Validating release 'bosh'... Finished (00:00:00)
  Downloading release 'bosh-virtualbox-cpi'... Finished (00:00:51)
  Validating release 'bosh-virtualbox-cpi'... Finished (00:00:00)
  Downloading release 'bosh-warden-cpi'... Finished (00:01:23)
  Validating release 'bosh-warden-cpi'... Finished (00:00:00)
  Downloading release 'os-conf'... Skipped [Found in local cache] (00:00:00)
  Validating release 'os-conf'... Finished (00:00:00)
  Downloading release 'garden-runc'... Finished (00:01:24)
  Validating release 'garden-runc'... Finished (00:00:00)
  Validating cpi release... Finished (00:00:00)
  Validating deployment manifest... Finished (00:00:00)
  Downloading stemcell... Finished (00:06:17)
  Validating stemcell... Finished (00:00:00)
Finished validating (00:11:55)

Started installing CPI
  Compiling package 'golang_1.7/21609f611781e8586e713cfd7ceb389cee429c5a'... Finished (00:00:00)
  Compiling package 'virtualbox_cpi/56bb5910d296d4eec615f9a6fc02ae84a8bbc933'... Finished (00:00:10)
  Installing packages... Finished (00:00:01)
  Rendering job templates... Finished (00:00:00)
  Installing job 'virtualbox_cpi'... Finished (00:00:00)
Finished installing CPI (00:00:13)

Starting registry... Finished (00:00:00)
Uploading stemcell 'bosh-vsphere-esxi-ubuntu-trusty-go_agent/3312.15'... Finished (00:00:09)

Started deploying
  Waiting for the agent on VM 'vm-ef4e89f8-1562-4719-69e8-704e40fc7561'... Failed (00:00:09)
  Deleting VM 'vm-ef4e89f8-1562-4719-69e8-704e40fc7561'... Finished (00:00:00)
  Creating VM for instance 'bosh/0' from stemcell 'sc-cbfbdb11-6778-476d-4266-9e1e5a54ff1d'... Finished (00:00:00)
  Waiting for the agent on VM 'vm-25b78ae7-0e21-4467-4839-9966c3e2a4f0' to be ready... Finished (00:00:30)
  Attaching disk 'disk-9fc1ecc0-1da0-4765-4ee3-a97bb8eca2c5' to VM 'vm-25b78ae7-0e21-4467-4839-9966c3e2a4f0'... Finished (00:00:04)
  Creating disk... Finished (00:00:00)
  Attaching disk 'disk-cfe3a9da-1591-4162-7f73-f347fdef6751' to VM 'vm-25b78ae7-0e21-4467-4839-9966c3e2a4f0'... Finished (00:00:04)
  Migrating disk content from 'disk-9fc1ecc0-1da0-4765-4ee3-a97bb8eca2c5' to 'disk-cfe3a9da-1591-4162-7f73-f347fdef6751'... Failed (00:06:12)
Failed deploying (00:07:03)

Stopping registry... Finished (00:00:00)
Cleaning up rendered CPI jobs... Finished (00:00:00)

Deploying:
  Creating instance 'bosh/0':
    Updating instance disks:
      Updating disks:
        Deploying disk:
          Sending 'get_task' to the agent:
            Agent responded with error: Action Failed get_task: Task 9419d09a-9b1e-40fa-440a-06f8e3c28cf5 result: Migrating persistent disk: Copying files from old disk to new disk: Running command: 'sh -c (tar -C /var/vcap/store -cf - .) | (tar -C /var/vcap/store_migration_target -xpf -)', stdout: '', stderr: 'tar: ./warden_cpi/ephemeral_bind_mounts_dir/dd18cfbf-fc6c-4ff2-4c59-9d44f779735b/warden/warden.sock: socket ignored

deploy.sh

$ cat ~/workspace/deployments-boshlite/magellan-1/deploy.sh
#! /bin/bash

gosh create-env ~/workspace/bosh-deployment/bosh.yml \
  --state ./state.json \
  -o ~/workspace/bosh-deployment/virtualbox/cpi.yml \
  -o ~/workspace/bosh-deployment/virtualbox/outbound-network.yml \
  -o ~/workspace/bosh-deployment/bosh-lite.yml \
  -o ~/workspace/bosh-deployment/bosh-lite-runc.yml \
  -o ~/workspace/bosh-deployment/jumpbox-user.yml \
  --vars-store ./creds.yml \
  -v director_name="Bosh Lite Director" \
  -v internal_ip=192.168.50.6 \
  -v internal_gw=192.168.50.1 \
  -v internal_cidr=192.168.50.0/24 \
  -v network_name=vboxnet0 \
  -v outbound_network_name=NatNetwork

If we run a destroy script with all the same options we get a more convincing result

$ ./deploy.sh
Deployment manifest: '/Users/scoen/workspace/bosh-deployment/bosh.yml'
Deployment state: './state.json'

Started validating
  Downloading release 'bosh'... Skipped [Found in local cache] (00:00:00)
  Validating release 'bosh'... Finished (00:00:00)
  Downloading release 'bosh-virtualbox-cpi'... Skipped [Found in local cache] (00:00:00)
  Validating release 'bosh-virtualbox-cpi'... Finished (00:00:00)
  Downloading release 'bosh-warden-cpi'... Skipped [Found in local cache] (00:00:00)
  Validating release 'bosh-warden-cpi'... Finished (00:00:00)
  Downloading release 'os-conf'... Skipped [Found in local cache] (00:00:00)
  Validating release 'os-conf'... Finished (00:00:00)
  Downloading release 'garden-runc'... Skipped [Found in local cache] (00:00:00)
  Validating release 'garden-runc'... Finished (00:00:00)
  Validating cpi release... Finished (00:00:00)
  Validating deployment manifest... Finished (00:00:00)
  Downloading stemcell... Skipped [Found in local cache] (00:00:00)
  Validating stemcell... Finished (00:00:01)
Finished validating (00:00:03)

Started installing CPI
  Compiling package 'golang_1.7/21609f611781e8586e713cfd7ceb389cee429c5a'... Finished (00:00:12)
  Compiling package 'virtualbox_cpi/56bb5910d296d4eec615f9a6fc02ae84a8bbc933'... Finished (00:00:09)
  Installing packages... Finished (00:00:01)
  Rendering job templates... Finished (00:00:00)
  Installing job 'virtualbox_cpi'... Finished (00:00:00)
Finished installing CPI (00:00:24)

Starting registry... Finished (00:00:00)
Uploading stemcell 'bosh-vsphere-esxi-ubuntu-trusty-go_agent/3312.15'... Finished (00:00:08)

Started deploying
  Creating VM for instance 'bosh/0' from stemcell 'sc-5c49122b-8c1e-4750-5917-1b0df2685e1f'... Finished (00:00:00)
  Waiting for the agent on VM 'vm-b9783c78-7fb7-41ce-68fb-7150e1444c9a' to be ready... Finished (00:00:31)
  Creating disk... Finished (00:00:00)
  Attaching disk 'disk-950ff613-4c4a-46de-461a-8e807ca2c0eb' to VM 'vm-b9783c78-7fb7-41ce-68fb-7150e1444c9a'... Finished (00:00:04)
  Rendering job templates... Finished (00:00:03)
  Compiling package 'libseccomp/7a54b27a61b42980935e863d7060dc5a076b44d0'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'golang_1.7.1/91909d54d203acc915a4392b52c37716e15b5aff'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'apparmor/c8e25d84146677878c699ddc5cdd893030acb26f'... Skipped [Package already compiled] (00:00:00)
  Compiling package 's3cli/8cbc6ee1b5acaac18c63fafc5989bd6911c9be83'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'ruby/589d4b05b422ac6c92ee7094fc2a402db1f2d731'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'mysql/b7e73acc0bfe05f1c6cbfd97bf92d39b0d3155d5'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'libpq/09c8f60b87c9bd41b37b0f62159c9d77163f52b8'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'davcli/5f08f8d5ab3addd0e11171f739f072b107b30b8c'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'golang_1.7/21609f611781e8586e713cfd7ceb389cee429c5a'... Finished (00:00:16)
  Compiling package 'runc/68f36fbe363fefa5ec8d44b48ee30a56ac6e1e0e'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'golang_1.7/c82ff355bb4bd412a4397dba778682293cd4f392'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'director/3451fde97191ac240d10ea180b659ed55ee0ccba'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'virtualbox_cpi/56bb5910d296d4eec615f9a6fc02ae84a8bbc933'... Finished (00:00:13)
  Compiling package 'health_monitor/884a822dc2547735ac42b889654ddf9f074bb7e7'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'guardian/c4acb6073abb4e17165253935c923dfbdfbfb188'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'nginx/21e909d27fa69b3b2be036cdf5b8b293c6800158'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'shadow/7a5e46357a33cafc8400a8e3e2e1f6d3a1159cb6'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'postgres-9.4/6c9e820cdfe15267c8f864f482c7fbed0943c6de'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'warden_cpi/29ac97b841a747dc238277ffc7d6bf59a278fa37'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'tar/f2ea61c537d8eb8cb2d691ce51e8516b28fa5bb7'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'postgres/4b9f6514001f7c3f7d4394920d6aced9435a3bbd'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'busybox/fc652425c32d0dad62f45bca18e1899671e2e570'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'iptables/24e83997945f8817627223c6cee78ca9064f42d5'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'nats/0155cf6be0305c9f98ba2e9e2503cd72da7c05c3'... Skipped [Package already compiled] (00:00:00)
  Updating instance 'bosh/0'... Finished (00:00:22)
  Waiting for instance 'bosh/0' to be running... Finished (00:00:10)
  Running the post-start scripts 'bosh/0'... Finished (00:00:00)
Finished deploying (00:01:45)

Stopping registry... Finished (00:00:00)
Cleaning up rendered CPI jobs... Finished (00:00:00)

Succeeded

destory.sh

$ cat ~/workspace/deployments-boshlite/magellan-1/destroy.sh
#! /bin/bash

gosh delete-env ~/workspace/bosh-deployment/bosh.yml \
  --state ./state.json \
  -o ~/workspace/bosh-deployment/virtualbox/cpi.yml \
  -o ~/workspace/bosh-deployment/virtualbox/outbound-network.yml \
  -o ~/workspace/bosh-deployment/bosh-lite.yml \
  -o ~/workspace/bosh-deployment/bosh-lite-runc.yml \
  -o ~/workspace/bosh-deployment/jumpbox-user.yml \
  --vars-store ./creds.yml \
  -v director_name="Bosh Lite Director" \
  -v internal_ip=192.168.50.6 \
  -v internal_gw=192.168.50.1 \
  -v internal_cidr=192.168.50.0/24 \
  -v network_name=vboxnet0 \
  -v outbound_network_name=NatNetwork

I would have like to see the delete-env fail more clearly when all the necessary options weren't provided.

tjvman commented 7 years ago

It seems like the issue here is that your state file was called state.json, rather than bosh-state.json, which caused the first delete-env to fail because it was referring to a non-existent file and the second to succeed. However, the fact that it failed on the disk migration is curious.

Now, given that you didn't provide the --state flag to the first delete-env, I'm wondering if the CLI defaults the state file name to bosh-state.json. If that's the case, it conflicts with the bosh-deployment documentation and should probably be changed.

shalako commented 7 years ago

@tjvman Thank you for the replies.

Now, given that you didn't provide the --state flag to the first delete-env, I'm wondering if the CLI defaults the state file name to bosh-state.json. If that's the case, it conflicts with the bosh-deployment documentation and should probably be changed.

Will you use this story to represent this action item?

If the CLI didn't output Succeeded when the operation failed, I would not have tried to run additional commands and been confused when they failed. More evidence for #108.

cppforlife commented 7 years ago

to clarify that command is a perfectly valid command hence we cannot just make it fail. there is no consistent way for us to know given that all our commands are idempotent. if you had any vms associated with the first command (uses bosh-state.json) then cli would have deleted them the first time, and subsequent time would have stated that there is nothing to do.

looking again at the output cli i do see that cli did indicate that it didnt have to do anything ("No deployment state file found") to complete with the given combination of flags. thats where the problem lies -- there is no easy way for us to know with what combination of flags you ve created env hence we cant know whats the correct combo of flags that you need to successfully delete.

do you think that "no deployment state file" info line should be reworded to indicate possibility that provided set of flags were wrong?

Sent from my iPhone

On Feb 13, 2017, at 10:53 AM, Shannon Coen notifications@github.com wrote:

@tjvman Thank you for the replies.

Now, given that you didn't provide the --state flag to the first delete-env, I'm wondering if the CLI defaults the state file name to bosh-state.json. If that's the case, it conflicts with the bosh-deployment documentation and should probably be changed.

Will you use this story to represent this action item?

If the CLI didn't output Succeeded when the operation failed, I would have tried to run additional commands and been confused when they failed. More evidence for #108.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

shalako commented 7 years ago

No deployment state file found. is a meaningful error. The problem is that the last line of the output is Succeeded, so I will ignore anything above that. Stop printing Succeeded and the problem would be solved because the useful message would be noticed.

As long as the CLI outputs Succeeded as the last line, any failure in the requested operation will go unnoticed.

cppforlife commented 7 years ago

'No deployment state file found.' is a meaningful error

it's not an error though but rather indication of perceived state. may be rephrasing it would indicate that better: "Nothing to delete since deployment state file is not present."

shalako commented 7 years ago

For what it's worth, I prefer the current wording. But it doesn't matter; since Succeeded is the last line in the output everything before it will be ignored. Then I make incorrect assumptions when I discover later that the intended state change didn't happen when the CLI told me it did.

mfine30 commented 6 years ago

This seems like a great use case and area to improve the UX. In an effort to limit the number of issues we have open, I'm going to close this in favor of #365 since it feels like this CLI "succeeded" UX all falls under the same boat. If you think this should have more of its own conversation, feel free to reopen.