cloudfoundry-attic / bosh-photon-cpi-release

Apache License 2.0
6 stars 8 forks source link

Getting 403 errors for bosh cck tasks #7

Open RamXX opened 8 years ago

RamXX commented 8 years ago
sandbox@ubuntu-bastion:~$ bosh cck
Acting as user 'admin' on deployment 'cf' on 'photon-bosh-sandbox'
RSA 1024 bit CA certificates are loaded due to old openssl compatibility
Performing cloud check...

Director task 313
  Started scanning 17 vms
  Started scanning 17 vms > Checking VM states. Failed: photon: { HTTP status: '403', code: 'AccessForbidden', message: '', data: 'map[]' } (00:00:10)

Error 100: photon: { HTTP status: '403', code: 'AccessForbidden', message: '', data: 'map[]' }

Task 313 error

For a more detailed error report, run: bosh task 313 --debug

Full operations like bosh deploy or bosh delete deployment work fine. An intermediate operation like bosh cck. --details don't add any additional meaningful info. This may be related with a token refresh.

kmjung commented 8 years ago

Tracked as https://www.pivotaltracker.com/story/show/129565427

RamXX commented 8 years ago

Tested cpi dev.3 and get the same message. Output below:

andbox@ubuntu-bastion:~$ photon vm stop 4f00ae99-89c3-4008-b9c6-5a30c0012ad1 && photon vm delete 4f00ae99-89c3-4008-b9c6-5a30c0012ad1 && bosh cck
Using target 'https://10.180.132.2:443'
STOP_VM completed for 'vm' entity 4f00ae99-89c3-4008-b9c6-5a30c0012ad1
Using target 'https://10.180.132.2:443'
DELETE_VM completed for 'vm' entity 4f00ae99-89c3-4008-b9c6-5a30c0012ad1
Acting as user 'admin' on deployment 'concourse' on 'photon-bosh-sandbox'
RSA 1024 bit CA certificates are loaded due to old openssl compatibility
Performing cloud check...

Director task 46
  Started scanning 3 vms
  Started scanning 3 vms > Checking VM states. Done (00:00:11)
  Started scanning 3 vms > 2 OK, 0 unresponsive, 1 missing, 0 unbound. Done (00:00:00)
     Done scanning 3 vms (00:00:11)

  Started scanning 1 persistent disks
  Started scanning 1 persistent disks > Looking for inactive disks. Done (00:00:00)
  Started scanning 1 persistent disks > 1 OK, 0 missing, 0 inactive, 0 mount-info mismatch. Done (00:00:00)
     Done scanning 1 persistent disks (00:00:00)

Task 46 done

Started        2016-09-02 06:54:53 UTC
Finished           2016-09-02 06:55:04 UTC
Duration           00:00:11

Scan is complete, checking if any problems found...

Found 1 problem

Problem 1 of 1: VM with cloud ID '4f00ae99-89c3-4008-b9c6-5a30c0012ad1' missing.
  1. Skip for now
  2. Recreate VM for 'web/0 (a2ad4133-1f70-4415-a080-a22904e032d8)' without waiting for processes to start
  3. Recreate VM for 'web/0 (a2ad4133-1f70-4415-a080-a22904e032d8)' and wait for processes to start
  4. Delete VM reference
Please choose a resolution [1 - 4]: 2

Below is the list of resolutions you've provided
Please make sure everything is fine and confirm your changes

  1. VM with cloud ID '4f00ae99-89c3-4008-b9c6-5a30c0012ad1' missing.
     Recreate VM for 'web/0 (a2ad4133-1f70-4415-a080-a22904e032d8)' without waiting for processes to start

Apply resolutions? (type 'yes' to continue): yes
Applying resolutions...

Director task 47
  Started applying problem resolutions > VM with cloud ID '4f00ae99-89c3-4008-b9c6-5a30c0012ad1' missing. (missing_vm 7): Recreate VM for 'web/0 (a2ad4133-1f70-4415-a080-a22904e032d8)' without waiting for processes to start. Failed: photon: { HTTP status: '403', code: 'AccessForbidden', message: '', data: 'map[]' } (00:00:01)

Error 100: Error resolving problem '12': photon: { HTTP status: '403', code: 'AccessForbidden', message: '', data: 'map[]' }

Task 47 error

For a more detailed error report, run: bosh task 47 --debug
sandbox@ubuntu-bastion:~$ bosh task 47 --cpi
Acting as user 'admin' on 'photon-bosh-sandbox'

RSA 1024 bit CA certificates are loaded due to old openssl compatibility
Director task 47

Task 47 error
yjyum commented 8 years ago

Seems to be a new issue with recreating vm/disk. Would you please try "bosh task 47 --debug" to get the log?

RamXX commented 8 years ago
sandbox@ubuntu-bastion:~$ bosh task 47 --debug
Acting as user 'admin' on 'photon-bosh-sandbox'

RSA 1024 bit CA certificates are loaded due to old openssl compatibility
Director task 47
I, [2016-09-02T06:55:13.409012 #23155] [0x2ac3d8463108]  INFO -- TaskHelper: Director Version: 1.3262.3.0
I, [2016-09-02T06:55:13.409580 #23155] [0x2ac3d8463108]  INFO -- TaskHelper: Enqueuing task: 47
I, [2016-09-02 06:55:13 #26778] []  INFO -- DirectorJobRunner: Looking for task with task id 47
D, [2016-09-02 06:55:13 #26778] [] DEBUG -- DirectorJobRunner: (0.000179s) SELECT NULL
D, [2016-09-02 06:55:13 #26778] [] DEBUG -- DirectorJobRunner: (0.000345s) SELECT * FROM "tasks" WHERE "id" = 47
I, [2016-09-02 06:55:13 #26778] []  INFO -- DirectorJobRunner: Found task #<Bosh::Director::Models::Task @values={:id=>47, :state=>"queued", :timestamp=>2016-09-02 06:55:13 UTC, :description=>"apply resolutions", :result=>nil, :output=>"/var/vcap/store/director/tasks/47", :checkpoint_time=>2016-09-02 06:55:13 UTC, :type=>"cck_apply", :username=>"admin", :deployment_name=>"concourse", :started_at=>nil}>
I, [2016-09-02 06:55:13 #26778] []  INFO -- DirectorJobRunner: Starting task: 47
I, [2016-09-02 06:55:13 #26778] [task:47]  INFO -- DirectorJobRunner: Creating job
D, [2016-09-02 06:55:13 #26778] [task:47] DEBUG -- DirectorJobRunner: (0.000122s) SELECT NULL
D, [2016-09-02 06:55:13 #26778] [task:47] DEBUG -- DirectorJobRunner: (0.000109s) SELECT NULL
D, [2016-09-02 06:55:13 #26778] [task:47] DEBUG -- DirectorJobRunner: (0.000488s) SELECT * FROM "deployments" WHERE ("name" = 'concourse') LIMIT 1
D, [2016-09-02 06:55:13 #26778] [task:47] DEBUG -- DirectorJobRunner: (0.000125s) SELECT NULL
D, [2016-09-02 06:55:13 #26778] [task:47] DEBUG -- DirectorJobRunner: (0.000202s) SELECT * FROM "tasks" WHERE "id" = 47
D, [2016-09-02 06:55:13 #26778] [task:47] DEBUG -- DirectorJobRunner: (0.000113s) SELECT NULL
D, [2016-09-02 06:55:13 #26778] [task:47] DEBUG -- DirectorJobRunner: (0.000202s) SELECT * FROM "tasks" WHERE "id" = 47
I, [2016-09-02 06:55:13 #26778] [task:47]  INFO -- DirectorJobRunner: Performing task: #<Bosh::Director::Models::Task @values={:id=>47, :state=>"queued", :timestamp=>2016-09-02 06:55:13 UTC, :description=>"apply resolutions", :result=>nil, :output=>"/var/vcap/store/director/tasks/47", :checkpoint_time=>2016-09-02 06:55:13 UTC, :type=>"cck_apply", :username=>"admin", :deployment_name=>"concourse", :started_at=>nil}>
D, [2016-09-02 06:55:13 #26778] [task:47] DEBUG -- DirectorJobRunner: (0.000129s) SELECT NULL
D, [2016-09-02 06:55:13 #26778] [task:47] DEBUG -- DirectorJobRunner: (0.000089s) BEGIN
D, [2016-09-02 06:55:13 #26778] [task:47] DEBUG -- DirectorJobRunner: (0.000431s) UPDATE "tasks" SET "state" = 'processing', "timestamp" = '2016-09-02 06:55:13.895693+0000', "description" = 'apply resolutions', "result" = NULL, "output" = '/var/vcap/store/director/tasks/47', "checkpoint_time" = '2016-09-02 06:55:13.895833+0000', "type" = 'cck_apply', "username" = 'admin', "deployment_name" = 'concourse', "started_at" = '2016-09-02 06:55:13.895760+0000' WHERE ("id" = 47)
D, [2016-09-02 06:55:13 #26778] [task:47] DEBUG -- DirectorJobRunner: (0.001024s) COMMIT
I, [2016-09-02 06:55:13 #26778] [task:47]  INFO -- DirectorJobRunner: Acquiring deployment lock on concourse
D, [2016-09-02 06:55:13 #26778] [task:47] DEBUG -- DirectorJobRunner: Acquiring lock: lock:deployment:concourse
D, [2016-09-02 06:55:13 #26778] [task:47] DEBUG -- DirectorJobRunner: (0.000124s) SELECT NULL
D, [2016-09-02 06:55:13 #26778] [task:47] DEBUG -- DirectorJobRunner: (0.000111s) SELECT NULL
D, [2016-09-02 06:55:13 #26778] [task:47] DEBUG -- DirectorJobRunner: (0.000117s) SELECT NULL
D, [2016-09-02 06:55:13 #26778] [task:47] DEBUG -- DirectorJobRunner: (0.000525s) SELECT count(*) AS "count" FROM "locks" WHERE (("name" = 'lock:deployment:concourse') AND ("uid" = 'a8a5fb35-5f0c-4064-b581-102a2b7f308d')) LIMIT 1
D, [2016-09-02 06:55:13 #26778] [task:47] DEBUG -- DirectorJobRunner: (0.000111s) SELECT NULL
D, [2016-09-02 06:55:13 #26778] [task:47] DEBUG -- DirectorJobRunner: (0.000077s) BEGIN
D, [2016-09-02 06:55:13 #26778] [task:47] DEBUG -- DirectorJobRunner: (0.000372s) INSERT INTO "locks" ("name", "uid", "expired_at") VALUES ('lock:deployment:concourse', 'a8a5fb35-5f0c-4064-b581-102a2b7f308d', '2016-09-02 06:55:24.902121+0000') RETURNING *
D, [2016-09-02 06:55:13 #26778] [task:47] DEBUG -- DirectorJobRunner: (0.007437s) COMMIT
D, [2016-09-02 06:55:13 #26778] [task:47] DEBUG -- DirectorJobRunner: Acquired lock: lock:deployment:concourse
D, [2016-09-02 06:55:13 #26778] [task:47] DEBUG -- DirectorJobRunner: (0.000121s) SELECT NULL
D, [2016-09-02 06:55:13 #26778] [task:47] DEBUG -- DirectorJobRunner: (0.000107s) SELECT NULL
D, [2016-09-02 06:55:13 #26778] [task:47] DEBUG -- DirectorJobRunner: (0.000423s) SELECT count(*) AS "count" FROM "deployment_problems" WHERE ("id" IN ('12')) LIMIT 1
I, [2016-09-02 06:55:13 #26778] [task:47]  INFO -- DirectorJobRunner: Applying problem resolutions
D, [2016-09-02 06:55:13 #26778] [task:47] DEBUG -- DirectorJobRunner: (0.000131s) SELECT NULL
D, [2016-09-02 06:55:13 #26778] [task:47] DEBUG -- DirectorJobRunner: (0.000108s) SELECT NULL
D, [2016-09-02 06:55:13 #26778] [task:47] DEBUG -- DirectorJobRunner: (0.000197s) SELECT * FROM "deployment_problems" WHERE ("id" IN ('12'))
D, [2016-09-02 06:55:13 #26778] [task:47] DEBUG -- DirectorJobRunner: (0.000691s) SELECT * FROM "instances" WHERE ("id" = 7) LIMIT 1
D, [2016-09-02 06:55:13 #26778] [task:47] DEBUG -- DirectorJobRunner: (0.000269s) SELECT * FROM "instances" WHERE ("id" = 7) LIMIT 1
I, [2016-09-02 06:55:13 #26778] [task:47]  INFO -- DirectorJobRunner: VM with cloud ID '4f00ae99-89c3-4008-b9c6-5a30c0012ad1' missing. (missing_vm 7): Recreate VM for 'web/0 (a2ad4133-1f70-4415-a080-a22904e032d8)' without waiting for processes to start
D, [2016-09-02 06:55:13 #26778] [task:47] DEBUG -- DirectorJobRunner: Recreating Vm: web/0 (a2ad4133-1f70-4415-a080-a22904e032d8))
D, [2016-09-02 06:55:13 #26778] [task:47] DEBUG -- DirectorJobRunner: Deleting Vm: web/0 (a2ad4133-1f70-4415-a080-a22904e032d8))
D, [2016-09-02 06:55:13 #26778] [] DEBUG -- DirectorJobRunner: Renewing lock: lock:deployment:concourse
D, [2016-09-02 06:55:13 #26778] [task:47] DEBUG -- DirectorJobRunner: (0.003933s) SELECT * FROM "deployments" WHERE "id" = 2
D, [2016-09-02 06:55:13 #26778] [] DEBUG -- DirectorJobRunner: (0.000986s) SET standard_conforming_strings = ON
D, [2016-09-02 06:55:13 #26778] [] DEBUG -- DirectorJobRunner: (0.000101s) SET client_min_messages = 'WARNING'
D, [2016-09-02 06:55:13 #26778] [] DEBUG -- DirectorJobRunner: (0.000116s) SET DateStyle = 'ISO'
D, [2016-09-02 06:55:13 #26778] [task:47] DEBUG -- DirectorJobRunner: (0.000221s) SELECT * FROM "tasks" WHERE "id" = 47
I, [2016-09-02 06:55:13 #26778] [task:47]  INFO -- DirectorJobRunner: Deleting VM
D, [2016-09-02 06:55:13 #26778] [task:47] DEBUG -- DirectorJobRunner: External CPI sending request: {"method":"delete_vm","arguments":["4f00ae99-89c3-4008-b9c6-5a30c0012ad1"],"context":{"director_uuid":"101da2c9-f611-4f1b-891e-9742cc25e7d9"}} with command: /var/vcap/jobs/cpi/bin/cpi
D, [2016-09-02 06:55:13 #26778] [] DEBUG -- DirectorJobRunner: (0.002344s) SELECT NULL
D, [2016-09-02 06:55:13 #26778] [] DEBUG -- DirectorJobRunner: (0.000116s) SELECT NULL
D, [2016-09-02 06:55:13 #26778] [] DEBUG -- DirectorJobRunner: (0.001447s) UPDATE "locks" SET "expired_at" = '2016-09-02 06:55:24.925077+0000' WHERE (("name" = 'lock:deployment:concourse') AND ("uid" = 'a8a5fb35-5f0c-4064-b581-102a2b7f308d'))
D, [2016-09-02 06:55:14 #26778] [task:47] DEBUG -- DirectorJobRunner: External CPI got response: {"result":null,"error":{"type":"Bosh::Clouds::CloudError","message":"photon: { HTTP status: '403', code: 'AccessForbidden', message: '', data: 'map[]' }","ok_to_retry":false},"log":"2016-09-02T06:55:14Z INFO Begin action delete_vm\n2016-09-02T06:55:14Z INFO Raw action arguments: []interface {}{\"4f00ae99-89c3-4008-b9c6-5a30c0012ad1\"}\n2016-09-02T06:55:14Z INFO Deleting VM: 4f00ae99-89c3-4008-b9c6-5a30c0012ad1\n2016-09-02T06:55:14Z INFO Detaching disks\n2016-09-02T06:55:14Z INFO Stopping VM\n2016-09-02T06:55:14Z ERROR Error encountered during action delete_vm: photon: { HTTP status: '403', code: 'AccessForbidden', message: '', data: 'map[]' }\n"}, err: , exit_status: pid 26804 exit 0
E, [2016-09-02 06:55:14 #26778] [task:47] ERROR -- DirectorJobRunner: Error resolving problem '12': photon: { HTTP status: '403', code: 'AccessForbidden', message: '', data: 'map[]' }
E, [2016-09-02 06:55:14 #26778] [task:47] ERROR -- DirectorJobRunner: /var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh_cpi-1.3262.3.0/lib/cloud/external_cpi.rb:119:in `handle_error'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh_cpi-1.3262.3.0/lib/cloud/external_cpi.rb:88:in `invoke_cpi_method'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh_cpi-1.3262.3.0/lib/cloud/external_cpi.rb:52:in `delete_vm'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh-director-1.3262.3.0/lib/bosh/director/vm_deleter.rb:29:in `block in delete_vm'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh-director-1.3262.3.0/lib/bosh/director/error_ignorer.rb:9:in `with_force_check'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh-director-1.3262.3.0/lib/bosh/director/vm_deleter.rb:28:in `delete_vm'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh-director-1.3262.3.0/lib/bosh/director/vm_deleter.rb:16:in `delete_for_instance'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh-director-1.3262.3.0/lib/bosh/director/cloudcheck_helper.rb:49:in `delete_vm_from_cloud'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh-director-1.3262.3.0/lib/bosh/director/cloudcheck_helper.rb:67:in `recreate_vm'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh-director-1.3262.3.0/lib/bosh/director/cloudcheck_helper.rb:62:in `recreate_vm_skip_post_start'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh-director-1.3262.3.0/lib/bosh/director/problem_handlers/missing_vm.rb:20:in `block (2 levels) in <class:MissingVM>'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh-director-1.3262.3.0/lib/bosh/director/problem_handlers/base.rb:83:in `instance_eval'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh-director-1.3262.3.0/lib/bosh/director/problem_handlers/base.rb:83:in `apply_resolution'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh-director-1.3262.3.0/lib/bosh/director/problem_resolver.rb:65:in `block in apply_resolution'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh-director-1.3262.3.0/lib/bosh/director/problem_resolver.rb:24:in `block in track_and_log'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh-director-1.3262.3.0/lib/bosh/director/event_log.rb:99:in `advance_and_track'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh-director-1.3262.3.0/lib/bosh/director/problem_resolver.rb:22:in `track_and_log'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh-director-1.3262.3.0/lib/bosh/director/problem_resolver.rb:64:in `apply_resolution'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh-director-1.3262.3.0/lib/bosh/director/problem_resolver.rb:43:in `block in apply_resolutions'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/sequel-4.32.0/lib/sequel/dataset/actions.rb:139:in `block in each'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/sequel-4.32.0/lib/sequel/adapters/postgres.rb:669:in `block (2 levels) in fetch_rows'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/sequel-4.32.0/lib/sequel/adapters/postgres.rb:861:in `block in yield_hash_rows'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/sequel-4.32.0/lib/sequel/adapters/postgres.rb:855:in `times'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/sequel-4.32.0/lib/sequel/adapters/postgres.rb:855:in `yield_hash_rows'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/sequel-4.32.0/lib/sequel/adapters/postgres.rb:669:in `block in fetch_rows'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/sequel-4.32.0/lib/sequel/adapters/postgres.rb:175:in `execute'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/sequel-4.32.0/lib/sequel/adapters/postgres.rb:525:in `_execute'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/sequel-4.32.0/lib/sequel/adapters/postgres.rb:341:in `block (2 levels) in execute'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/sequel-4.32.0/lib/sequel/adapters/postgres.rb:546:in `check_database_errors'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/sequel-4.32.0/lib/sequel/adapters/postgres.rb:341:in `block in execute'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/sequel-4.32.0/lib/sequel/database/connecting.rb:251:in `block in synchronize'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/sequel-4.32.0/lib/sequel/connection_pool/threaded.rb:105:in `hold'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/sequel-4.32.0/lib/sequel/database/connecting.rb:251:in `synchronize'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/sequel-4.32.0/lib/sequel/adapters/postgres.rb:341:in `execute'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/sequel-4.32.0/lib/sequel/dataset/actions.rb:952:in `execute'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/sequel-4.32.0/lib/sequel/adapters/postgres.rb:669:in `fetch_rows'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/sequel-4.32.0/lib/sequel/dataset/actions.rb:139:in `each'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh-director-1.3262.3.0/lib/bosh/director/problem_resolver.rb:34:in `apply_resolutions'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh-director-1.3262.3.0/lib/bosh/director/jobs/cloud_check/apply_resolutions.rb:38:in `block in perform'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh-director-1.3262.3.0/lib/bosh/director/lock_helper.rb:13:in `block in with_deployment_lock'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh-director-1.3262.3.0/lib/bosh/director/lock.rb:53:in `lock'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh-director-1.3262.3.0/lib/bosh/director/lock_helper.rb:13:in `with_deployment_lock'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh-director-1.3262.3.0/lib/bosh/director/jobs/cloud_check/apply_resolutions.rb:37:in `perform'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh-director-1.3262.3.0/lib/bosh/director/job_runner.rb:94:in `perform_job'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh-director-1.3262.3.0/lib/bosh/director/job_runner.rb:32:in `block in run'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh_common-1.3262.3.0/lib/common/thread_formatter.rb:49:in `with_thread_name'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh-director-1.3262.3.0/lib/bosh/director/job_runner.rb:32:in `run'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh-director-1.3262.3.0/lib/bosh/director/jobs/base_job.rb:10:in `perform'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh-director-1.3262.3.0/lib/bosh/director/jobs/db_job.rb:24:in `block in perform'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh-director-1.3262.3.0/lib/bosh/director/jobs/db_job.rb:72:in `block (3 levels) in run'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/eventmachine-1.0.4/lib/eventmachine.rb:1046:in `block in spawn_threadpool'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/logging-1.8.2/lib/logging/diagnostic_context.rb:323:in `block in create_with_logging_context'
D, [2016-09-02 06:55:14 #26778] [] DEBUG -- DirectorJobRunner: Lock renewal thread exiting
D, [2016-09-02 06:55:14 #26778] [task:47] DEBUG -- DirectorJobRunner: (0.000616s) SELECT NULL
D, [2016-09-02 06:55:14 #26778] [task:47] DEBUG -- DirectorJobRunner: (0.000349s) SELECT NULL
D, [2016-09-02 06:55:14 #26778] [task:47] DEBUG -- DirectorJobRunner: (0.000202s) SELECT NULL
D, [2016-09-02 06:55:14 #26778] [task:47] DEBUG -- DirectorJobRunner: (0.001180s) DELETE FROM "locks" WHERE (("name" = 'lock:deployment:concourse') AND ("uid" = 'a8a5fb35-5f0c-4064-b581-102a2b7f308d'))
D, [2016-09-02 06:55:14 #26778] [task:47] DEBUG -- DirectorJobRunner: Deleted lock: lock:deployment:concourse uid: a8a5fb35-5f0c-4064-b581-102a2b7f308d
E, [2016-09-02 06:55:14 #26778] [task:47] ERROR -- DirectorJobRunner: Error resolving problem '12': photon: { HTTP status: '403', code: 'AccessForbidden', message: '', data: 'map[]' }
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh-director-1.3262.3.0/lib/bosh/director/jobs/cloud_check/apply_resolutions.rb:41:in `block in perform'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh-director-1.3262.3.0/lib/bosh/director/lock_helper.rb:13:in `block in with_deployment_lock'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh-director-1.3262.3.0/lib/bosh/director/lock.rb:53:in `lock'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh-director-1.3262.3.0/lib/bosh/director/lock_helper.rb:13:in `with_deployment_lock'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh-director-1.3262.3.0/lib/bosh/director/jobs/cloud_check/apply_resolutions.rb:37:in `perform'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh-director-1.3262.3.0/lib/bosh/director/job_runner.rb:94:in `perform_job'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh-director-1.3262.3.0/lib/bosh/director/job_runner.rb:32:in `block in run'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh_common-1.3262.3.0/lib/common/thread_formatter.rb:49:in `with_thread_name'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh-director-1.3262.3.0/lib/bosh/director/job_runner.rb:32:in `run'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh-director-1.3262.3.0/lib/bosh/director/jobs/base_job.rb:10:in `perform'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh-director-1.3262.3.0/lib/bosh/director/jobs/db_job.rb:24:in `block in perform'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/bosh-director-1.3262.3.0/lib/bosh/director/jobs/db_job.rb:72:in `block (3 levels) in run'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/eventmachine-1.0.4/lib/eventmachine.rb:1046:in `block in spawn_threadpool'
/var/vcap/packages/director/gem_home/ruby/2.3.0/gems/logging-1.8.2/lib/logging/diagnostic_context.rb:323:in `block in create_with_logging_context'
D, [2016-09-02 06:55:14 #26778] [task:47] DEBUG -- DirectorJobRunner: (0.000190s) SELECT NULL
D, [2016-09-02 06:55:14 #26778] [task:47] DEBUG -- DirectorJobRunner: (0.000127s) BEGIN
D, [2016-09-02 06:55:14 #26778] [task:47] DEBUG -- DirectorJobRunner: (0.000527s) UPDATE "tasks" SET "state" = 'error', "timestamp" = '2016-09-02 06:55:14.585470+0000', "description" = 'apply resolutions', "result" = 'Error resolving problem ''12'': photon: { HTTP status: ''403'', code: ''AccessForbidden'', message: '''', data: ''map[]'' }', "output" = '/var/vcap/store/director/tasks/47', "checkpoint_time" = '2016-09-02 06:55:13.895833+0000', "type" = 'cck_apply', "username" = 'admin', "deployment_name" = 'concourse', "started_at" = '2016-09-02 06:55:13.895760+0000' WHERE ("id" = 47)
D, [2016-09-02 06:55:14 #26778] [task:47] DEBUG -- DirectorJobRunner: (0.000860s) COMMIT
I, [2016-09-02 06:55:14 #26778] []  INFO -- DirectorJobRunner: Task took 0.698524497 seconds to process.

Task 47 error
RamXX commented 8 years ago

Another instance of this 403 error appears under the following conditions:

bosh disks --orphaned
RSA 1024 bit CA certificates are loaded due to old openssl compatibility

+--------------------------------------+------------+-----------------+---------------------------------------------------------------+-----+-------------------------+
| Disk CID                             | Size (MiB) | Deployment Name | Instance Name                                                 | AZ  | Orphaned At             |
+--------------------------------------+------------+-----------------+---------------------------------------------------------------+-----+-------------------------+
| fde356ae-c138-433e-92f5-e168f42ac45f | 3000       | cf              | diego_database-partition/df6c312d-95c4-439a-bf5d-0d600422594a | az1 | 2016-09-06 21:23:47 UTC |
| ebcb2004-6b15-4c7d-9270-e14ddd440ffc | 3000       | cf              | diego_cell-partition/d96aa784-60fb-4864-9a2a-c1272b1c2f6d     | az1 | 2016-09-06 21:23:45 UTC |
| f1513951-8482-48f0-9213-b2d38d3ca1ba | 3000       | cf              | etcd_server-partition/938e9135-8ffc-400d-aa25-ee08822632f5    | az1 | 2016-09-06 21:23:42 UTC |
| 3e2c743c-2a19-40a1-b134-166e55961523 | 50000      | cf              | nfs_server-partition/eb8582be-000a-48ba-b4c1-6b5cb2c174c7     | az1 | 2016-09-06 21:23:41 UTC |
| 906687ca-6dc1-4d9d-abba-f6dd3dad72ef | 50000      | cf              | mysql-partition/ae6c31c9-e824-4b2b-9939-29b8d60b992c          | az1 | 2016-09-06 21:23:40 UTC |
| b77621ab-be83-4e9a-b82b-8b7041321141 | 50000      | cf              | diego_brain-partition/fc1e8104-1062-4885-8708-61119175e56f    | az1 | 2016-09-06 21:23:40 UTC |
| 6c274e41-3a78-4e46-8266-620980098eea | 30000      | concourse       | db/3705ecf8-1df6-4e0b-8fa5-98c6de3696a2                       | az1 | 2016-09-02 07:14:49 UTC |
+--------------------------------------+------------+-----------------+---------------------------------------------------------------+-----+-------------------------+

One or more of these disks were already deleted using photon disk delete <id>, and yet they are still in the DB. The solution for this would be a bosh cleanup --all, but the 403 error appears again:

bosh cleanup --all
Acting as user 'admin' on 'photon-bosh-sandbox'
RSA 1024 bit CA certificates are loaded due to old openssl compatibility

Director task 207
  Started deleting orphaned disks
  Started deleting orphaned disks > 6c274e41-3a78-4e46-8266-620980098eea
  Started deleting orphaned disks > 906687ca-6dc1-4d9d-abba-f6dd3dad72ef
  Started deleting orphaned disks > b77621ab-be83-4e9a-b82b-8b7041321141
  Started deleting orphaned disks > 3e2c743c-2a19-40a1-b134-166e55961523
  Started deleting orphaned disks > f1513951-8482-48f0-9213-b2d38d3ca1ba
  Started deleting orphaned disks > ebcb2004-6b15-4c7d-9270-e14ddd440ffc
  Started deleting orphaned disks > fde356ae-c138-433e-92f5-e168f42ac45f
   Failed deleting orphaned disks > ebcb2004-6b15-4c7d-9270-e14ddd440ffc: photon: { HTTP status: '403', code: 'AccessForbidden', message: '', data: 'map[]' } (00:00:01)
   Failed deleting orphaned disks > 6c274e41-3a78-4e46-8266-620980098eea: photon: { HTTP status: '403', code: 'AccessForbidden', message: '', data: 'map[]' } (00:00:01)
   Failed deleting orphaned disks > 906687ca-6dc1-4d9d-abba-f6dd3dad72ef: photon: { HTTP status: '403', code: 'AccessForbidden', message: '', data: 'map[]' } (00:00:01)
   Failed deleting orphaned disks > b77621ab-be83-4e9a-b82b-8b7041321141: photon: { HTTP status: '403', code: 'AccessForbidden', message: '', data: 'map[]' } (00:00:01)
   Failed deleting orphaned disks > fde356ae-c138-433e-92f5-e168f42ac45f: photon: { HTTP status: '403', code: 'AccessForbidden', message: '', data: 'map[]' } (00:00:01)
   Failed deleting orphaned disks > f1513951-8482-48f0-9213-b2d38d3ca1ba: photon: { HTTP status: '403', code: 'AccessForbidden', message: '', data: 'map[]' } (00:00:01)
   Failed deleting orphaned disks > 3e2c743c-2a19-40a1-b134-166e55961523: photon: { HTTP status: '403', code: 'AccessForbidden', message: '', data: 'map[]' } (00:00:01)
   Failed deleting orphaned disks (00:00:01)

Error 100: photon: { HTTP status: '403', code: 'AccessForbidden', message: '', data: 'map[]' }

Task 207 error

Cleanup complete

The CPI log is:

bosh task 207 --cpi
Acting as user 'admin' on 'photon-bosh-sandbox'

RSA 1024 bit CA certificates are loaded due to old openssl compatibility
Director task 207
2016-09-07T20:31:06Z INFO Begin action delete_stemcell
2016-09-07T20:31:06Z INFO Raw action arguments: []interface {}{"684ec597-1568-4308-a5d3-e6bccdaf224b"}
2016-09-07T20:31:06Z INFO DeleteStemcell with stemcell_cid: '684ec597-1568-4308-a5d3-e6bccdaf224b'
2016-09-07T20:31:06Z INFO Beginning stemcell deletion
2016-09-07T20:31:06Z INFO Waiting on task: &photon.Task{ID:"44303b91-d7e0-4b81-b8b1-853a1001711e", Operation:"DELETE_IMAGE", State:"COMPLETED", StartedTime:1473280251000, EndTime:1473280251000, QueuedTime:1473280251000, Entity:photon.Entity{ID:"684ec597-1568-4308-a5d3-e6bccdaf224b", Kind:"image"}, SelfLink:"https://10.180.132.2/tasks/44303b91-d7e0-4b81-b8b1-853a1001711e", Steps:[]photon.Step{}, ResourceProperties:interface {}(nil)}
2016-09-07T20:31:06Z INFO Action response: <nil>
2016-09-07T20:31:06Z INFO End action delete_stemcell

Task 207 error
molteanu commented 8 years ago

@RamXX: Thanks for this. We have a fix for the VM and we started working on a fix for disk deletes yesterday as I suspected the test you are doing by deleting a disk out of band might be another failure scenario that would run into issues.

Would you be able to send us a complete list of failure scenario tests you are performing so we can address them all before we give you another drop of the CPI?

@cppforlife: Are these failure scenarios covered by BATs?

RamXX commented 8 years ago

@molteanu The typical test that we run is the "pull the rug" test, where we stop+delete a BOSH-controlled VM from the photon CLI and then do a bosh cck, or simply wait a minute or two for the resurrector to kick in. That definitely triggers the 403 error.

The other test we started doing yesterday was the one I posted above, attempting to delete orphaned resources that may or may not be there.

Overall, cck is the mechanism that we trust to fix inconsistencies between what BOSH believes the world to be, and what the IaaS is really doing. Network is also typically a component of this, but given the very simple networking model in Photon at the moment is not necessary.

molteanu commented 8 years ago

We released a new version of the CPI which should address all these failures and a few other issues we discovered while testing these scenarios: https://github.com/cloudfoundry-incubator/bosh-photon-cpi-release/commit/bb57eb27dcb6e7dd49000534a2ea35702605f449