cloudfoundry-attic / bosh-lite

A lite development env for BOSH
Apache License 2.0
319 stars 244 forks source link

Response exceeded maximum allowed length #239

Open styeung opened 9 years ago

styeung commented 9 years ago

Hi,

We tried deploying a trusty branch of CF-Release (https://github.com/cloudfoundry/cf-release/tree/trusty64-rootfs) and got the following error:

Started updating job runner_z1 > runner_z1/0. Failed: Response exceeded maximum allowed length (00:00:39)
Error 450001: Response exceeded maximum allowed length

The full error log can be found here.

We've been able to successfully deploy before, and this is the first time we've seen this error message. What's causing this?

Thanks,

Sai To

styeung commented 9 years ago

Here is our output from bosh task 3 --debug

bosh-ci-push-pull commented 9 years ago

That's probably the stdout/stderr of the failure being larger than the size NATS allows per message. At some point in the past we added some code to catch that on the agent side and only send the last 100 lines of the message, but it's possible that the output had some really long lines (like megabytes in size). I thought we had a guard around that, but maybe not. I'm also not sure when the last time bosh-lite's stemcell was updated with the latest agent, but I assume it has this feature since it was added several month ago.

It could also be some other message response being too long...

@cppforlife got any other ideas?

On Tue, Feb 24, 2015 at 1:55 PM, Sai To Yeung notifications@github.com wrote:

Here https://gist.github.com/styeung/4e8b4d17057e4817e8df is our output from bosh task 3 --debug

— Reply to this email directly or view it on GitHub https://github.com/cloudfoundry/bosh-lite/issues/239#issuecomment-75854915 .

jtarchie commented 9 years ago

With that release, on this particular box, we are able to reproduce this error.

Our next steps are to remove .blobs and .bosh/cache and try again.

bosh-ci-push-pull commented 9 years ago

You said bosh cli plugin? I'm a dummy.

On Feb 25, 2015, at 8:46 AM, JT Archie notifications@github.com wrote:

With that release, on this particular box, we are able to reproduce this error.

Our next steps are to remove .blobs and .bosh/cache and try again.

— Reply to this email directly or view it on GitHub https://github.com/cloudfoundry/bosh-lite/issues/239#issuecomment-75997227 .

cppforlife commented 9 years ago

This is definitely stdout/stderr going over the limit due to how we use tar (verbose mode) in the Agent. Real problem here is that it fails to untar. This could be either due to invalid package cache or for some reason compilation stage did not successfully tar up the package. Since this is bosh-lite best way to go about it is to blow away that deployment and cf-release from the Director.

cppforlife commented 9 years ago

We'll adjust bosh-agent eventually to no log everything from tar command.

jtarchie commented 9 years ago

We are able to reproduce this error again. The strange part of it is that we can produce it on our CI machine, but unable to reproduce it on our dev machine, where the deployment of Bosh Lite and CF worked perfectly.

benmoss commented 8 years ago

We were able to reproduce this bug by running this errand:

#!/bin/bash
#

for (( i = 0; i < 1024 * 1024 * 2; i++ )); do
    echo "Hello!"
done

This was on a bosh-init deployed vSphere director, so not sure this is necessarily a bosh-lite problem.

It looks like the nats handler does not publish any of the message if it gets an error from the PerformHandler: https://github.com/cloudfoundry/bosh-agent/blob/fcb52b4f1aeae2c0c48e76c374b6f80354cbece5/mbus/nats_handler.go#L161-L164