cloudfoundry / bosh-openstack-cpi-release

BOSH OpenStack CPI
Apache License 2.0
36 stars 59 forks source link

Getting errors when creating VMs after upgrading to Liberty when Bosh passes no AZ config #54

Closed jaimegag closed 7 years ago

jaimegag commented 7 years ago

Getting the following error when creating VMs after OpenStack was upgraded from Kilo to Liberty: OpenStack API Bad Request (Invalid input for field/attribute availability_zone. Value: . u'' is too short)

We don't specify any AZ information in the Bosh manifests and let OpenStack decide it. When Bosh tries to create a new VM the CPI seems to be passing the availability_zone key/property with an empty string as value, and it appears that the OpenStack API doesn't like this anymore after the upgrade to Liberty. It seems that the API has changed in the way it accepts the availability-zone argument when Liberty was released, and all OpenStack API calls going forward behave in this "new" way. If no AZ information is available no availability_zone argument should reach the API, instead of an empty argument.

We have also tested and debugged creating servers with the OpenStack CLI and confirmed that they do not pass the availability_zone argument at all.

These are some details from our setup to help reproduce/troubleshoot the problem:

Interestingly enough, we don't get errors when deploying bosh director with bosh-init, with the same configuration guidelines detailed above. I'll include log information below, both from bosh and bosh-init:

For BOSH-INIT

Here's the create_vm with its parameters: availability_zone is (correctly) missing:

STDIN:'{"method":"create_vm","arguments":["c5f14b0a-194b-433f-62f0-31a8862b0c17","49c0200d-3ebd-4ce4-a459-1dc7821131d6",{"instance_type":"generic1.medium"},{"private":{"cloud_properties":{},"default":["dns","gateway"],"type":"dynamic"},"public":{"cloud_properties":{},"ip":"XXXXX","type":"vip"}},[],{}],"context":{"director_uuid":"19ea882a-aca5-426d-54fa-59b8a871b39e"}}'

After that there are several excon.requests in the log, none of them with the availability_zone (good). Here's one example :

D, [2016-11-30T15:28:54.794585 #25991] DEBUG -- : excon.request {:chunk_size=>1048576, :ciphers=>"HIGH:!SSLv2:!aNULL:!eNULL:!3DES", :connect_timeout=>60, :debug_request=>false, :debug_response=>true, :headers=>{"User-Agent"=>"fog-core/1.40.0", "Content-Type"=>"application/json", "Accept"=>"application/json", "X-Auth-Token"=>"XXXXX", "Host"=>"openstack.XXXXXX.com:8774"},  :body=>"{\"server\":{\"flavorRef\":\"6d83cccc-a401-4b3d-a2c0-f95a9bac5eb5\",\"name\":\"vm-3a74565c-bebb-4e3b-b03a-6b7d0722fe1a\",\"imageRef\":\"49c0200d-3ebd-4ce4-a459-1dc7821131d6\",\"user_data\":\"eyJyZWdpc3RyeSI6eyJlbmRwb2ludCI6Imh0dHA6Ly8xMjcuMC4wLjE6Njkw\\nMSJ9LCJzZXJ2ZXIiOnsibmFtZSI6InZtLTNhNzQ1NjVjLWJlYmItNGUzYi1i\\nMDNhLTZiN2QwNzIyZmUxYSJ9LCJuZXR3b3JrcyI6eyJwcml2YXRlIjp7ImNs\\nb3VkX3Byb3BlcnRpZXMiOnt9LCJkZWZhdWx0IjpbImRucyIsImdhdGV3YXki\\nXSwidHlwZSI6ImR5bmFtaWMiLCJ1c2VfZGhjcCI6dHJ1ZX0sInB1YmxpYyI6\\neyJjbG91ZF9wcm9wZXJ0aWVzIjp7fSwiaXAiOiIxMC4xMjQuMjQ0LjMwIiwi\\ndHlwZSI6InZpcCIsInVzZV9kaGNwIjp0cnVlfX19\\n\",\"key_name\":\"XXXXX\",\"security_groups\":[{\"name\":\"default\"}],\"networks\":[]}}", :expects=>[200, 202], :method=>"POST", :host=>"openstack.XXXXXX.com", :hostname=>"openstack.XXXXXX.com", :path=>"", :port=>8774, :query=>nil, :scheme=>"https", :no_proxy=>"openstack.XXXXXX.com", :instrumentor=>Bosh::OpenStackCloud::ExconLoggingInstrumentor} @socket_key="https://openstack.XXXXXX.com:8774">>>>>>}

For BOSH

Here's the create_vm with its parameters: availability_zone is (correctly) missing:

D, [2016-11-30 15:53:08 #19378] [create_missing_vm(consul_server-partition/2 (cb48e7ac-4081-4011-93ac-aa54b0b9a8df)/16)] DEBUG -- DirectorJobRunner: External CPI sending request: {"method":"create_vm","arguments":["94995d6c-fdb4-4046-8347-7ada49c4d9a5","7b57d867-c100-4bde-8887-2f58cae1b323",{"instance_type":"generic1.small"},{"default":{"type":"dynamic","cloud_properties":{"security_groups":["default"]},"dns":["XXXXXX"],"default":["dns","gateway"]}},["38c29138-439f-40df-9a7a-4b19f46d5b9b"],{}],"context":{"director_uuid":"c982fd17-0855-4d8f-a44e-53574d01b574"}} with command: /var/vcap/jobs/openstack_cpi/bin/cpi

However, in this case the following excon.requests in the logs __include the availability_zone__ argument with an empty string. Here's one example:

D, [2016-11-30T15:53:13.043132 #21397] DEBUG -- : excon.request {:chunk_size=>1048576, :ciphers=>"HIGH:!SSLv2:!aNULL:!eNULL:!3DES", :connect_timeout=>60, :debug_request=>false, :debug_response=>true, :headers=>{"User-Agent"=>"fog-core/1.40.0", "Content-Type"=>"application/json", "Accept"=>"application/json", "X-Auth-Token"=>"XXXXXXXX", "Host"=>"openstack.XXXXXXXX.com:8774"},  :body=>"{\"server\":{\"flavorRef\":\"b8035c8b-5695-4122-ac09-c0c6b68feff9\",\"name\":\"vm-5903d266-0576-4b56-9706-c4d5b5f5287d\",\"imageRef\":\"7b57d867-c100-4bde-8887-2f58cae1b323\",\"availability_zone\":\"\",\"user_data\":\"eyJyZWdpc3RyeSI6eyJlbmRwb2ludCI6Imh0dHA6Ly9hZG1pbjpjMW91ZGMw\\nd0AxMC4xMjQuMjQ0LjMwOjI1Nzc3In0sInNlcnZlciI6eyJuYW1lIjoidm0t\\nNTkwM2QyNjYtMDU3Ni00YjU2LTk3MDYtYzRkNWI1ZjUyODdkIn0sIm5ldHdv\\ncmtzIjp7ImRlZmF1bHQiOnsidHlwZSI6ImR5bmFtaWMiLCJjbG91ZF9wcm9w\\nZXJ0aWVzIjp7InNlY3VyaXR5X2dyb3VwcyI6WyJkZWZhdWx0Il19LCJkbnMi\\nOlsiMTAuMTI0LjI0NC4zMCJdLCJkZWZhdWx0IjpbImRucyIsImdhdGV3YXki\\nXSwidXNlX2RoY3AiOnRydWV9fSwiZG5zIjp7Im5hbWVzZXJ2ZXIiOlsiMTAu\\nMTI0LjI0NC4zMCJdfX0=\\n\",\"key_name\":\"XXXX\",\"security_groups\":[{\"name\":\"default\"}],\"networks\":[]}}", :expects=>[200, 202], :method=>"POST", :host=>"openstack.XXXXXX.com", :hostname=>"openstack.XXXXXX.com", :path=>"", :port=>8774, :query=>nil, :scheme=>"https", :no_proxy=>"openstack.XXXXXX.com", :instrumentor=>Bosh::OpenStackCloud::ExconLoggingInstrumentor} @socket_key="https://openstack.XXXXXX.com:8774">>>>>>}

And they result into the error we have been seeing:

D, [2016-11-30T15:53:13.059109 #21397] DEBUG -- : excon.error {:error=>#<Excon::Errors::BadRequest: Expected([200, 202]) <=> Actual(400 Bad Request)
excon.error.response
  :body          => "{\"badRequest\": {\"message\": \"Invalid input for field/attribute availability_zone. Value: . u'' is too short\", \"code\": 400}}"
  :cookies       => [
  ]
  :headers       => {
    "Content-Length"               => "122"
    "Content-Type"                 => "application/json; charset=UTF-8"
    "Date"                         => "Wed, 30 Nov 2016 15:53:13 GMT"
    "Vary"                         => "X-OpenStack-Nova-API-Version"
    "X-Compute-Request-Id"         => "req-808e29ec-f883-4d90-acb6-458d159cae3c"
    "X-Openstack-Nova-Api-Version" => "2.1"
  }
  :host          => "openstack.XXXXXX.com"
  :local_address => "XXXXXX"
  :local_port    => 54938
  :path          => "/v2.1/db3479a908fa4c5994c53532ecd61fa2/servers.json"
  :port          => 8774
  :reason_phrase => ""
  :remote_ip     => "XXXXX"
  :status        => 400
  :status_line   => "HTTP/1.1 400 \r\n"
>}
D, [2016-11-30 15:53:13 #19378] [create_missing_vm(consul_server-partition/2 (cb48e7ac-4081-4011-93ac-aa54b0b9a8df)/16)] DEBUG -- DirectorJobRunner: External CPI got response: {"result":null,"error":{"type":"Bosh::Clouds::CloudError","message":"OpenStack API Bad Request (Invalid input for field/attribute availability_zone. Value: . u'' is too short). Check task debug log for details.","ok_to_retry":false},"log":"Rescued Bosh::Clouds::CloudError: OpenStack API Bad Request (Invalid input for field/attribute availability_zone. Value: . u'' is too short). Check task debug log for details.. backtrace: /var/vcap/packages/bosh_openstack_cpi/lib/cloud/openstack/helpers.rb:24:in

If more information is needed please let me know.

cf-gitbot commented 7 years ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/135269529

The labels on this github issue will be updated when the story is started.

tomoe commented 7 years ago

Can you share nova-api.log?

tomoe commented 7 years ago

I seems like its the case that is tested: https://github.com/openstack/nova/blob/4f91ed3a547965ed96a22520edcfb783e7936e95/nova/tests/unit/api/openstack/compute/test_availability_zone.py#L296

So, the nova-api is rejecting a request with availability_zone==''

Do you have a HTTP request trace for bosh-init? It might be a difference between availability_zone=='' and availability_zone being not present in the HTTP requests.

voelzmo commented 7 years ago

@jaimegag Thanks for creating the issue with the thorough summary and logs!

Here are my observations:

Conclusion: I suspect that your volume is in an OpenStack Availability Zone "" (empty string). That seems to be a cinder issue. Could you please verify this by calling openstack volume show 38c29138-439f-40df-9a7a-4b19f46d5b9b -f json where 38c29138-439f-40df-9a7a-4b19f46d5b9b is the volume's UUID from the example above. I suppose you're seeing something like

{
  "status": "available",
(...)
  "availability_zone": "",
  "bootable": "false",
(...)
  "id": "38c29138-439f-40df-9a7a-4b19f46d5b9b",
  "attachments": []
}

We certainly should check also for an empty string before taking in the volume's AZ, but I would be interested in the above analysis.

jaimegag commented 7 years ago

@voelzmo Yes, that volume and all we have are no specific to any AZ, so the openstack volume show call returns what you expected:

root@here# openstack volume show 38c29138-439f-40df-9a7a-4b19f46d5b9b -f json
{
  "size": 1,
  "status": "available",
(...)
  "availability_zone": "",
  "bootable": "false",
(...)
  "id": "38c29138-439f-40df-9a7a-4b19f46d5b9b",
  "attachments": []
}

It seems that the empty string might be coming from there

voelzmo commented 7 years ago

Fix released in v29