cloudfoundry / bosh-google-cpi-release

BOSH Google CPI
Apache License 2.0
63 stars 96 forks source link

Error deleting vm during Creating missing vms step in errand #181

Closed mfine30 closed 6 years ago

mfine30 commented 7 years ago
$ bosh run-errand smoke-tests
Using environment 'https://35.185.70.30:25555' as client 'user-HFRI9W2'

Using deployment 'dedicated-mysql-broker'

Task 7103

16:38:14 | Preparing deployment: Preparing deployment
16:38:14 | Preparing package compilation: Finding packages to compile (00:00:00)
16:38:14 | Preparing deployment: Preparing deployment (00:00:00)
16:38:14 | Creating missing vms: smoke-tests/38f51ba4-2db0-41a1-a933-b5d180974190 (0) (00:00:43)
            L Error: CPI error 'Bosh::Clouds::CloudError' with message 'Deleting vm 'vm-fef5b047-e051-4835-4725-f67be2208ce9': Failed to find Google Instance 'vm-fef5b047-e051-4835-4725-f67be2208ce9': Get https://www.googleapis.com/compute/v1/projects/cf-dedicated-mysql/aggregated/instances?alt=json&filter=name+eq+.%2Avm-fef5b047-e051-4835-4725-f67be2208ce9: oauth2: cannot fetch token: 400 Bad Request
Response: {
  "error" : "invalid_grant",
  "error_description" : "Invalid JWT Signature."
}' in 'delete_vm' CPI method

16:41:58 | Error: CPI error 'Bosh::Clouds::CloudError' with message 'Deleting vm 'vm-fef5b047-e051-4835-4725-f67be2208ce9': Failed to find Google Instance 'vm-fef5b047-e051-4835-4725-f67be2208ce9': Get https://www.googleapis.com/compute/v1/projects/cf-dedicated-mysql/aggregated/instances?alt=json&filter=name+eq+.%2Avm-fef5b047-e051-4835-4725-f67be2208ce9: oauth2: cannot fetch token: 400 Bad Request
Response: {
  "error" : "invalid_grant",
  "error_description" : "Invalid JWT Signature."
}' in 'delete_vm' CPI method

Started  Mon May 22 16:38:14 UTC 2017
Finished Mon May 22 16:41:58 UTC 2017
Duration 00:03:44

Task 7103 error

Running errand 'smoke-tests':
  Expected task '7103' to succeed but was state is 'error'

Exit code 1

When creating a VM during an errand run, I get an error from the CPI about failing to delete a VM. This is with the golang bosh cli.

A following attempt to run the errand fails with:

16:42:09 | Preparing deployment: Preparing deployment
16:42:10 | Preparing package compilation: Finding packages to compile (00:00:00)
16:42:10 | Preparing deployment: Preparing deployment (00:00:01)
16:42:10 | Creating missing vms: smoke-tests/38f51ba4-2db0-41a1-a933-b5d180974190 (0) (00:01:01)
16:43:11 | Updating instance smoke-tests: smoke-tests/38f51ba4-2db0-41a1-a933-b5d180974190 (0) (canary) (00:03:20)
            L Error: CPI error 'Bosh::Clouds::CloudError' with message 'Creating vm: Failed to find Google Image 'stemcell-9f443a2a-bfbb-48b2-49fd-8933875cec1b': Get https://www.googleapis.com/compute/v1/projects/cf-dedicated-mysql/global/images/stemcell-9f443a2a-bfbb-48b2-49fd-8933875cec1b?alt=json: oauth2: cannot fetch token: 400 Bad Request
Response: {
  "error" : "invalid_grant",
  "error_description" : "Invalid JWT Signature."
}' in 'create_vm' CPI method
...

And then:

16:47:33 | Preparing deployment: Preparing deployment
16:47:33 | Preparing package compilation: Finding packages to compile (00:00:00)
16:47:33 | Preparing deployment: Preparing deployment (00:00:00)
16:47:33 | Creating missing vms: smoke-tests/38f51ba4-2db0-41a1-a933-b5d180974190 (0) (00:00:11)
            L Error: CPI error 'Bosh::Clouds::CloudError' with message 'Creating vm: Failed to find Google Image 'stemcell-9f443a2a-bfbb-48b2-49fd-8933875cec1b': Get https://www.googleapis.com/compute/v1/projects/cf-dedicated-mysql/global/images/stemcell-9f443a2a-bfbb-48b2-49fd-8933875cec1b?alt=json: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "Google Internet Authority G2")' in 'create_vm' CPI method
johnsonj commented 7 years ago

Hi!

Few questions: Does this repro for a bosh deploy or just errands? Is the private key used for the service account still valid?

mfine30 commented 7 years ago

bosh deploy seems to work no problem. On the next attempt after the above three it worked successfully without any problem. The key still works and is valid. Someone else on my team said he saw the same thing a while ago. It failed in different ways and after enough attempts it just worked.

cppforlife commented 7 years ago

@johnsonj @mfine30 this could happen anywhere (bosh deploy, recreate, etc.) since it's all executing same cpi calls. i've definitely seen it happen in other places.

johnsonj commented 7 years ago

Is the director configured with a JSON key or a service account in this case?

This error doesn't look related to time skew but to the key used to sign the OAuth request. Either the key we're using isn't valid (not all there? set? we don't verify the contents of the json_key) or we don't successfully transmit it (eg #167 type bug)

johnsonj commented 6 years ago

Closing due to age and lack of action-ability. Please re-open if this problem crops up.