cloudfoundry / cloud_controller_ng

Cloud Foundry Cloud Controller
Apache License 2.0
191 stars 357 forks source link

Unexpected application stop on PUT /v2/apps/<guid> #1253

Closed stephanme closed 4 years ago

stephanme commented 5 years ago

Issue

cf curl "/v2/apps/61a0c24d-cb6d-42f4-b4a3-2e2c7ef4b988" -X PUT -d '{"name":"staticfile-test","space_guid":"b79d0289-75dc-47bd-9460-7283508db450","stack_guid":"dca76a04-c0a0-4887-a7da-e2c9f659cbc7","ports": [8080],"instances":1,"memory":260,"disk_quota": 1024,"stack_guid": "dca76a04-c0a0-4887-a7da-e2c9f659cbc7","buildpack":"https://github.com/cloudfoundry/staticfile-buildpack.git#v1.4.29","enable_ssh":true,"health_check_type":"port","health_check_timeout":null,"health_check_http_endpoint":""}'

results in stopping the application (which was running before). Note that I don't change the application state.

Context

cf-deployment 3.6 capi-release 1.66.0

Any running application, e.g. a simple hello-world style app using the staticfile buildpack.

App configuration:

stephan@WDFN34095835A:~$ cf curl "/v2/apps/61a0c24d-cb6d-42f4-b4a3-2e2c7ef4b988"
{
   "metadata": {
      "guid": "61a0c24d-cb6d-42f4-b4a3-2e2c7ef4b988",
      "url": "/v2/apps/61a0c24d-cb6d-42f4-b4a3-2e2c7ef4b988",
      "created_at": "2018-07-25T09:34:25Z",
      "updated_at": "2018-11-09T08:02:21Z"
   },
   "entity": {
      "name": "staticfile-test",
      "production": false,
      "space_guid": "b79d0289-75dc-47bd-9460-7283508db450",
      "stack_guid": "dca76a04-c0a0-4887-a7da-e2c9f659cbc7",
      "buildpack": "https://github.com/cloudfoundry/staticfile-buildpack.git#v1.4.29",
      "detected_buildpack": "staticfile",
      "detected_buildpack_guid": null,
      "environment_json": {},
      "memory": 256,
      "instances": 1,
      "disk_quota": 1024,
      "state": "STARTED",
      "version": "a6a4d8d2-1ec8-4852-a133-bf7036aecf6c",
      "command": null,
      "console": false,
      "debug": null,
      "staging_task_id": "74cefcdc-b019-405e-9de7-dacc8b58bf9e",
      "package_state": "STAGED",
      "health_check_type": "port",
      "health_check_timeout": null,
      "health_check_http_endpoint": "",
      "staging_failed_reason": null,
      "staging_failed_description": null,
      "diego": true,
      "docker_image": null,
      "docker_credentials": {
         "username": null,
         "password": null
      },
      "package_updated_at": "2018-10-31T09:52:15Z",
      "detected_start_command": "$HOME/boot.sh",
      "enable_ssh": true,
      "ports": [
         8080
      ],
      "space_url": "/v2/spaces/b79d0289-75dc-47bd-9460-7283508db450",
      "stack_url": "/v2/stacks/dca76a04-c0a0-4887-a7da-e2c9f659cbc7",
      "routes_url": "/v2/apps/61a0c24d-cb6d-42f4-b4a3-2e2c7ef4b988/routes",
      "events_url": "/v2/apps/61a0c24d-cb6d-42f4-b4a3-2e2c7ef4b988/events",
      "service_bindings_url": "/v2/apps/61a0c24d-cb6d-42f4-b4a3-2e2c7ef4b988/service_bindings",
      "route_mappings_url": "/v2/apps/61a0c24d-cb6d-42f4-b4a3-2e2c7ef4b988/route_mappings"
   }
}

The problem was detected when trying to scale out an app using a cloud foundry terraform provider. Scaling out with "cf scale -i" works as expected.

Steps to Reproduce

Call PUT /v2/apps/61a0c24d-cb6d-42f4-b4a3-2e2c7ef4b988 with a set of parameters. The parameters don't change the app configuration at all (compare with the GET above).

cf curl "/v2/apps/61a0c24d-cb6d-42f4-b4a3-2e2c7ef4b988" -X PUT -d '{"name":"staticfile-test","space_guid":"b79d0289-75dc-47bd-9460-7283508db450","stack_guid":"dca76a04-c0a0-4887-a7da-e2c9f659cbc7","ports": [8080],"instances":1,"memory":256,"disk_quota": 1024,"buildpack":"https://github.com/cloudfoundry/staticfile-buildpack.git#v1.4.29","enable_ssh":true,"health_check_type":"port","health_check_timeout":null,"health_check_http_endpoint":""}'

Expected result

Ideally a 200 without any side effects because the PUT doesn't change anything on the app configuration. Worst case a 200 with an application restart.

In any case, the application shall be (eventually) running.

Current result

The PUT call is executed successfully:

stephan@WDFN34095835A:~$ cf curl "/v2/apps/61a0c24d-cb6d-42f4-b4a3-2e2c7ef4b988" -X PUT -d '{"name":"staticfile-test","space_guid":"b79d0289-75dc-47bd-9460-7283508db450","stack_guid":"dca76a04-c0a0-4887-a7da-e2c9f659cbc7","ports": [8080],"instances":1,"memory":256,"disk_quota": 1024,"buildpack":"https://github.com/cloudfoundry/staticfile-buildpack.git#v1.4.29","enable_ssh":true,"health_check_type":"port","health_check_timeout":null,"health_check_http_endpoint":""}'
{
   "metadata": {
      "guid": "61a0c24d-cb6d-42f4-b4a3-2e2c7ef4b988",
      "url": "/v2/apps/61a0c24d-cb6d-42f4-b4a3-2e2c7ef4b988",
      "created_at": "2018-07-25T09:34:25Z",
      "updated_at": "2018-11-09T08:25:07Z"
   },
   "entity": {
      "name": "staticfile-test",
      "production": false,
      "space_guid": "b79d0289-75dc-47bd-9460-7283508db450",
      "stack_guid": "dca76a04-c0a0-4887-a7da-e2c9f659cbc7",
      "buildpack": "https://github.com/cloudfoundry/staticfile-buildpack.git#v1.4.29",
      "detected_buildpack": null,
      "detected_buildpack_guid": null,
      "environment_json": {},
      "memory": 256,
      "instances": 1,
      "disk_quota": 1024,
      "state": "STARTED",
      "version": "a6a4d8d2-1ec8-4852-a133-bf7036aecf6c",
      "command": null,
      "console": false,
      "debug": null,
      "staging_task_id": "74cefcdc-b019-405e-9de7-dacc8b58bf9e",
      "package_state": "PENDING",
      "health_check_type": "port",
      "health_check_timeout": null,
      "health_check_http_endpoint": "",
      "staging_failed_reason": null,
      "staging_failed_description": null,
      "diego": true,
      "docker_image": null,
      "docker_credentials": {
         "username": null,
         "password": null
      },
      "package_updated_at": "2018-10-31T09:52:15Z",
      "detected_start_command": "",
      "enable_ssh": true,
      "ports": [
         8080
      ],
      "space_url": "/v2/spaces/b79d0289-75dc-47bd-9460-7283508db450",
      "stack_url": "/v2/stacks/dca76a04-c0a0-4887-a7da-e2c9f659cbc7",
      "routes_url": "/v2/apps/61a0c24d-cb6d-42f4-b4a3-2e2c7ef4b988/routes",
      "events_url": "/v2/apps/61a0c24d-cb6d-42f4-b4a3-2e2c7ef4b988/events",
      "service_bindings_url": "/v2/apps/61a0c24d-cb6d-42f4-b4a3-2e2c7ef4b988/service_bindings",
      "route_mappings_url": "/v2/apps/61a0c24d-cb6d-42f4-b4a3-2e2c7ef4b988/route_mappings"
   }
}

However, the application instance is stopped:

stephan@WDFN34095835A:~$ cf logs staticfile-test
Retrieving logs for app staticfile-test in org uptime / space uptimeci as d047883...

   2018-11-09T09:25:07.12+0100 [API/17] OUT Updated app with guid 61a0c24d-cb6d-42f4-b4a3-2e2c7ef4b988 ({"name"=>"staticfile-test", "space_guid"=>"b79d0289-75dc-47bd-9460-7283508db450", "stack_guid"=>"dca76a04-c0a0-4887-a7da-e2c9f659cbc7", "ports"=>[8080], "instances"=>1, "memory"=>256, "disk_quota"=>1024, "buildpack"=>"https://github.com/cloudfoundry/staticfile-buildpack.git#v1.4.29", "enable_ssh"=>true, "health_check_type"=>"port", "health_check_http_endpoint"=>""})
   2018-11-09T09:27:00.01+0100 [CELL/0] OUT Cell 2ba94ee9-aa3f-4c92-95fa-9e162da28c34 stopping instance fe60f493-e953-4cf6-5687-8a48
   2018-11-09T09:27:00.86+0100 [CELL/SSHD/0] OUT Exit status 0
   2018-11-09T09:27:11.92+0100 [CELL/0] OUT Cell 2ba94ee9-aa3f-4c92-95fa-9e162da28c34 destroying container for instance fe60f493-e953-4cf6-5687-8a48
   2018-11-09T09:27:12.19+0100 [CELL/0] OUT Cell 2ba94ee9-aa3f-4c92-95fa-9e162da28c34 successfully destroyed container for instance fe60f493-e953-4cf6-5687-8a48

The application is then in a strange state:

stephan@WDFN34095835A:~$ cf a
Getting apps in org uptime / space uptimeci as d047883...
OK

name              requested state   instances   memory   disk   urls
staticfile-test   started           0/1         256M     1G     d047883demo.cfapps.sap.hana.ondemand.com
...

stephan@WDFN34095835A:~$ cf app staticfile-test
Showing health and status for app staticfile-test in org uptime / space uptimeci as d047883...

Application instances '61a0c24d-cb6d-42f4-b4a3-2e2c7ef4b988' not found.
FAILED

stephan@WDFN34095835A:~$ cf start staticfile-test
Starting app staticfile-test in org uptime / space uptimeci as d047883...
App staticfile-test is already started

stephan@WDFN34095835A:~$ cf restart staticfile-test
Restarting app staticfile-test in org uptime / space uptimeci as d047883...

Stopping app...

Staging app and tracing logs...
   Cell 83d08a58-dd38-4902-ab7d-d77db4d3ca90 creating container for instance c0c7b929-150b-4f34-bf2e-a820c19beda7
   Cell 83d08a58-dd38-4902-ab7d-d77db4d3ca90 successfully created container for instance c0c7b929-150b-4f34-bf2e-a820c19beda7
   Downloading app package...
   Downloading build artifacts cache...
   Downloaded app package (622B)
   Downloaded build artifacts cache (2.7M)
   -----> Download go 1.9.1
   -----> Running go build supply
   -----> Staticfile Buildpack version 1.4.29
   -----> Installing nginx
          Using nginx version 1.15.0
   -----> Installing nginx 1.15.0
          Copy [/tmp/cache/final/dependencies/5044afed76c4040c15dc61516aa5ccb348ba369d535cc8b8a2b9bb082dcba592/nginx-1.15.0-linux-x64-64919fa9.tgz]
   -----> Running go build finalize
   -----> Root folder /tmp/app
   -----> Copying project files into public
   -----> Configuring nginx
   Exit status 0
   Uploading droplet, build artifacts cache...
   Uploading droplet...
   Uploading build artifacts cache...
   Uploaded build artifacts cache (2.7M)
   Uploaded droplet (2.7M)
   Uploading complete
   Cell 83d08a58-dd38-4902-ab7d-d77db4d3ca90 stopping instance c0c7b929-150b-4f34-bf2e-a820c19beda7
   Cell 83d08a58-dd38-4902-ab7d-d77db4d3ca90 destroying container for instance c0c7b929-150b-4f34-bf2e-a820c19beda7
   Cell 83d08a58-dd38-4902-ab7d-d77db4d3ca90 successfully destroyed container for instance c0c7b929-150b-4f34-bf2e-a820c19beda7

Waiting for app to start...

name:              staticfile-test
requested state:   started
instances:         1/1
usage:             256M x 1 instances
routes:            d047883demo.cfapps.sap.hana.ondemand.com
last uploaded:     Wed 31 Oct 10:52:15 STD 2018
stack:             cflinuxfs2
buildpack:         https://github.com/cloudfoundry/staticfile-buildpack.git#v1.4.29
start command:     $HOME/boot.sh

     state     since                  cpu    memory      disk      details
#0   running   2018-11-09T08:31:45Z   0.0%   0 of 256M   0 of 1G

The application seems to be in STARTED state even though there is no instance running anymore. A "cf restart" results in restaging the application.

cf-gitbot commented 5 years ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/161835286

The labels on this github issue will be updated when the story is started.

ericpromislow commented 5 years ago

Hi @stephanme

Does this happen if you repeat the exact same curl -X PUT... command? If the cloud controller detected that one or more sensitive fields on the process object have changed, it will assign a new version to the process and you'll see momentary outage, but the app should start up again on its own.

BTW, if this is the case, we are working on a way to prevent that type of outage, for the v3 API.

Regards, @ericpromislow CAPI Community Pair (soloing)

stephanme commented 5 years ago

Hi @ericpromislow,

the application goes down after a single curl -X PUT .... There is no need to repeat the command.

The application does not recover from this outage. It remains stopped and cf apps shows it in requested state 'started' with '0/1' instances (see also above in section 'Current result'). I have to run a cf restart staticfile-test manually to get it running again (and this restart results actually in a re-stage).

Best regards, Stephan

cwlbraa commented 4 years ago

Changing the ports of a v2 app will cause it to synchronously restart. This is improved with v3 deployments. Closing due to age.