cloudfoundry / cloud_controller_ng

Cloud Foundry Cloud Controller
Apache License 2.0
191 stars 357 forks source link

app update fails when app package is in a failed state #903

Closed dkoper closed 6 years ago

dkoper commented 7 years ago

Issue

Update of an app (V2 API) with a failed package fails.

PUT /v2/apps/411589dc-6592-4428-94af-5ff82ec61123 HTTP/1.1
...
{
  "disk_quota": 1024,
  "health_check_type": "port",
  "instances": 1,
  "memory": 32,
  "name": "bigapp",
 }
RESPONSE: [2017-08-30T14:19:27-07:00]
HTTP/1.1 400 Bad Request
...
{
  "description": "The app package is invalid: bits have not been uploaded",
  "error_code": "CF-AppPackageInvalid",
  "code": 150001
}

When only including the "name" attribute, the API call does not return an error.

Context

Full details are here: https://github.com/cloudfoundry/cli/issues/1216

  1. User tries to push big app. App is created but upload fails with The app package is invalid: Package may not be larger than 1073741824 bytes.
  2. User reduces size of app and pushes again. CLI makes above PUT call before uploading the new app bits, but gets the above error response.

Steps to Reproduce

  1. cf push bigapp -p smallerfile.zip (bigfile.zip < 1GB) -> App should start successfully
  2. cf push bigapp -p bigfile.zip (bigfile.zip > 1GB) -> Deploy fails with the following message, but app remains in STARTED state:
    Error processing app files: Error uploading application.
    The app package is invalid: Package may not be larger than 1073741824 bytes
  3. cf push bigapp -p smallerfile.zip -i 1 -m 32M -s cflinuxfs2 (bigfile.zip < 1GB) -> Update fails with:
    diesk@cloud-cf:~$ cf push bigfile -p smallfile.zip -i 1 -m 32M -s cflinuxfs2
    Using stack cflinuxfs2...
    OK
    Updating app bigfile in org dies-test / space dev as admin...
    FAILED
    Server error, status code: 400, error code: 150001, message: The app package is invalid: bits have not been uploaded

Expected result

cf push bigapp -p smallerfile.zip -i 1 -m 32M -s cflinuxfs2 should not fail. I expected the PUT request to do nothing and just return success as this app's state was already STARTED (even though package state FAILED), and the specified attributes already had those values.

Current result

After failing to upload my app bits and fixing the issue locally, I cannot simply try again.

cf-gitbot commented 7 years ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/151239418

The labels on this github issue will be updated when the story is started.

matt-royal commented 7 years ago

Hi @dkoper,

We tried to reproduce this, but were unable to. Pushing the smaller zip after the too-big zip succeeds for us on the newest capi-release. We also tried to reproduce the issue from https://github.com/cloudfoundry/cli/issues/1216 by first pushing a small, valid app before trying to push the too-big zip over it, etc. We couldn't reproduce that issue either. Are there additional reproduction steps we should be trying? Have you confirmed this behavior recently?

Thanks, Matt & @lisamcho

dkoper commented 7 years ago

Hmm... I was able to reproduce this 2 weeks ago (see date in trace) on what was then the latest CF deployment release.

Just to check about your too-big zip file, did you confirm the file size when uploaded was bigger than the size mentioned in the error message? Note that cf push rezips any archive specified to cf push -p so if you created a big zip file without compression, the uploaded file may be much smaller.

Also, I assume the 1073741824 bytes size in the CC error message is server side configurable. Was your zip file bigger than what's configured in your environment? Are you saying you don't see this error message with the latest capi-release when pushing a big zip file (that's sounds like an issue by itself if the upper limit is not checked for anymore!)?

anniesing commented 7 years ago

Hey @dkoper,

We tried to reproduce this issue today, and couldn't. The error message we receive when trying to push a large app is one that we would expect to see, as follows:

RESPONSE: [2017-10-09T14:20:55-07:00]
HTTP/1.1 413 Request Entity Too Large
Connection: close
Content-Length: 192
Content-Type: text/html
Date: Mon, 09 Oct 2017 21:20:55 GMT
Server: nginx
X-Vcap-Request-Id: dc4dcb54-50bb-4ea2-4423-698619a79c79

<html>
<head><title>413 Request Entity Too Large</title></head>
<body bgcolor="white">
<center><h1>413 Request Entity Too Large</h1></center>
<hr><center>nginx</center>
</body>
</html>

FAILED
Error processing app files: Error uploading application.
Server error, status code: 413, error code: 0, message:

Unfortunately, we weren't able to verify the size of the app after pushing since it did not get uploaded. The zipped file was 1.8G when we tried to push it. Do you have any other tips for reproducing -- perhaps sending us your app files?

dkoper commented 6 years ago

From your message it looks like you're hitting a max upload size configured on the web server fronting CC, while I'm hitting a limit on CC? Can you remove/bump the limit on your web server?

I tried again, got the same error as before (which is different from yours). I created a big file using this script: https://unix.stackexchange.com/a/202090 And zipping up several of the generated files to create one >1GB zip file.

tabana commented 6 years ago

We are having the same the same issue. Everything builds and deploys but won't re-start:

Server error, status code: 400, error code: 150001, message: The app package is invalid: bits have not been uploaded

Our error state is not caused by a large file.

tabana commented 6 years ago

Our manifest looks like this:


applications:

tabana commented 6 years ago

We are deploying with cf version 6.29.2+c66d0f3.2017-08-25 to PCF v1.12.

Gerg commented 6 years ago

It looks like this issue was caused because the app has state: "STARTED", even though it's package failed to upload. When changing attributes on the app in this state, validations fail because an app cannot have state: "STARTED" if its package_state is FAILED.

The question here is how an app was able to get stuck in this invalid state in the first place.

Gerg commented 6 years ago

@dkoper We still haven't been able to reproduce this on our environments. We pushed an app with a large package and also pushed a different app first with a small package and then update it with a large package.

Each time, we were able to see the The app package is invalid: Package may not be larger than 1073741824 bytes error but did not see the The app package is invalid: Package may not be larger than 1073741824 bytes error unless we tried to modify the state to STARTED.

Can you provide the following: What version of CC you are you hitting? What version of the CLI are you using? What command did you use to create the large app (so we are uploading the same package)? What CLI commands are you running to reproduce the issue?

Thanks, Greg && @ericpromislow

giner commented 6 years ago

This happens to us almost every time when blobstore is under deploy and somebody pushes an app while it is not accessible.

dkoper commented 6 years ago

@Gerg I think I figured out what you & I may be doing differently: In the issue in the CLI tracker, I mentioned the error does not occur if you do cf push bigapp -p smallfile.zip (i.e. without other arguments that update the app (e.g. -i, -m, -s)). Yet, in my issue description here those arguments were not reflected. I have updated the Steps to Reproduce.

Reproduced again using cf version 6.33.0+a345ea34d.2017-11-20, CC 2.99.0, steps to create big file:

diesk@cloud-cf:~/workspace$ cat ./bigfile.sh
#!/bin/bash
filecount=0
while [ $filecount -lt 10 ]
do
filesize=$((RANDOM%9+1))
filesize=$(($filesize*104857600))
</dev/urandom head -c "$filesize" | gzip > /tmp/file${filecount}.$RANDOM.gz
((filecount++))
done
diesk@cloud-cf:~/workspace$ ./bigfile.sh
diesk@cloud-cf:~/workspace$ cd /tmp
diesk@cloud-cf:/tmp$ ls -la file*
-rw-rw-r-- 1 diesk diesk 629247237 Nov 27 10:54 file0.23605.gz
-rw-rw-r-- 1 diesk diesk 419498523 Nov 27 10:54 file1.22783.gz
-rw-rw-r-- 1 diesk diesk 524372480 Nov 27 10:55 file2.18533.gz
-rw-rw-r-- 1 diesk diesk 209749028 Nov 27 10:55 file3.3284.gz
-rw-rw-r-- 1 diesk diesk 209749096 Nov 27 10:55 file4.11397.gz
-rw-rw-r-- 1 diesk diesk 943871114 Nov 27 10:57 file5.23628.gz
-rw-rw-r-- 1 diesk diesk 943870410 Nov 27 10:58 file6.2077.gz
-rw-rw-r-- 1 diesk diesk 943870598 Nov 27 10:59 file7.32221.gz
-rw-rw-r-- 1 diesk diesk 524372951 Nov 27 11:00 file8.2143.gz
-rw-rw-r-- 1 diesk diesk 314623387 Nov 27 11:00 file9.8236.gz
diesk@cloud-cf:/tmp$ jar cvf bigfile.zip file0.23605.gz file2.18533.gz
added manifest
adding: file0.23605.gz(in = 629247237) (out= 629439182)(deflated 0%)
adding: file2.18533.gz(in = 524372480) (out= 524532435)(deflated 0%)
diesk@cloud-cf:/tmp$ ls -la bigfile.zip
-rw-rw-r-- 1 diesk diesk 1153972199 Nov 27 11:19 bigfile.zip
anniesing commented 6 years ago

Hi @dkoper,

Thanks for the updated reproduction steps. We were able to reproduce your error today.

We looked into the issue, and didn't find the root cause or solution yet. At a cursory glance, it seems like there might be an issue around the way we handle pushing an app while using the -i flag, but we're not sure yet.

We need time to focus on this issue, so we're surfacing to @zrob so he can prioritize this in our backlog accordingly.

Thanks, CAPI Community Pair (Annie and @ericpromislow)

ericpromislow commented 6 years ago

And now I can't reproduce this. Details at https://www.pivotaltracker.com/story/show/151239418/comments/184331342

lisamburns commented 6 years ago

I can reproduce this. I did not have to change the nginx.conf or set any quotas. It seems specifically to be a result of the '-s cflinuxfs2' flag.

± lc |master ?:5 ✗| → cf push bigapp -p smallerfile
...
OK

± lc |master ?:5 ✗| → cf push bigapp -p bigfile.zip
Updating app bigapp in org org / space space as admin...
OK

Uploading bigapp...
Uploading app files from: /var/folders/qx/dfk6jyx17y54jk5cw7x177gw0000gn/T/unzipped-app470788235
Uploading 1.1G, 2 files
Done uploading
FAILED
Error processing app files: Error uploading application.
The app package is invalid: Package may not be larger than 1073741824 bytes

**Checking at this point, the app's package_state is FAILED and the state is STARTED, but the app is still running.

± lc |master ?:3 ✗| → cf apps
Getting apps in org org / space space as admin...
OK

name     requested state   instances   memory   disk   urls
bigapp   started           1/1         32M      1G     bigapp.clear-crystal.capi.land
± lc |master ?:3 ✗| → curl bigapp.clear-crystal.capi.land
hello
 2018-01-08 11:46:09 ⛅️  ruby 2.4.2p198 hazelwood in ~/workspace/capi-env-pool/clear-crystal
± lc |master ?:5 ✗| → cf push bigapp -p smallerfile -i 1 -m 32M -s cflinuxfs2
Using stack cflinuxfs2...
OK
Updating app bigapp in org org / space space as admin...
FAILED
Server error, status code: 400, error code: 150001, message: The app package is invalid: bits have not been uploaded

± lc |master ?:5 ✗| → cf push bigapp -p smallerfile -s cflinuxfs2
Using stack cflinuxfs2...
OK
Updating app bigapp in org org / space space as admin...
FAILED
Server error, status code: 400, error code: 150001, message: The app package is invalid: bits have not been uploaded

± lc |master ?:5 ✗| → cf push bigapp -p smallerfile
Updating app bigapp in org org / space space as admin...
OK

Note: My bigfile.zip size is:

-rw-r--r--   1 pivotal  wheel  1153972452 Jan  8 11:30 bigfile.zip
lisamburns commented 6 years ago

The desired behavior per @zrob is as follows:

Package state is FAILED:
Update with no change from state (stopped or started) should allow users to change their stack but shouldn't stage
Update (and requested state is started) should fail with existing error message (regardless of whether you are changing stack or not).
anniesing commented 6 years ago

Hi @dkoper,

@lisamcho and I pushed a fix for this issue today. Look for it in a future CAPI release.

Thanks for surfacing this issue!

Annie and @lisamcho