java-buildpack incompatible with the use of cf scale -MEMORY cmd

gberche-orange commented 11 years ago

The java-buildpack statically applies memory heuristics at staging time, i.e. includes the -Xmx option in the start cmd. As a result, if a user is using the cf scale -memory command to decrease the memory, the JVM will still try to allocate up to the initial memory account. The jvm will then exceed the memory limit and be killed by the warden container.

Possible workaround: apply the memory heuristics within the droplet using the http://docs.cloudfoundry.com/docs/using/deploying-apps/environment-variable.html#VCAP_APPLICATION limits entry.

glyn commented 11 years ago

For the new Java buildpack, we considered calculating the memory heuristics at droplet runtime (before we had access to $MEMORY_LIMIT at staging time), but this is pretty nasty because of the lack of reasonable maths precision in shell scripts. It would be necessary to add Ruby to the droplet, which seems like bloat.

A better solution would be to somehow get the buildpack to recalculate the memory limits after cf scale, but I understand that's a larger design change.

We could consider if there are ways to precalculate a guard which could run at droplet runtime to avoid excessive memory consumption, although I have no idea whether that's feasible.

On 5 Sep 2013, at 17:22, Guillaume Berche notifications@github.com wrote:

The java-buildpack statically applies memory heuristics at staging time, i.e. includes the -Xmx option in the start cmd. As a result, if a user is using the cf scale -memory command to decrease the memory, the JVM will still try to allocate up to the initial memory account. The jvm will then exceed the memory limit and be killed by the warden container.

Possible workaround: apply the memory heuristics within the droplet using the http://docs.cloudfoundry.com/docs/using/deploying-apps/environment-variable.html#VCAP_APPLICATION limits entry.

— Reply to this email directly or view it on GitHub.

gberche-orange commented 11 years ago

I understand from http://docs.cloudfoundry.com/docs/running/architecture/stacks.html that ruby 1.9.3http://docs.cloudfoundry.com/docs/running/architecture/stacks.html%20that%20ruby%201.9.3 is indeed available at runtime in the default stack. I guess then that this would be a matter of running a small ruby program (a module/part of the buildpack) in the start command prior to excuting the jvm that would re apply the memory heuristics. Does this seem feasible ?

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you.

glyn commented 11 years ago

Yes, that seems feasible. We'd still want to run the heuristics at staging time because they emit useful warnings in addition to setting the memory sizes. But we could re-run the latter part of the calculation at droplet execution time. Alternatively, we could generate a shell script mapping feasible $MEMORY_LIMIT values to memory settings if we want to avoid the Ruby dependency.

I'll raise a feature for gracefully supporting scaling down. Thanks for raising this.

On 6 Sep 2013, at 08:07, Guillaume Berche notifications@github.com wrote:

I understand from http://docs.cloudfoundry.com/docs/running/architecture/stacks.html that ruby 1.9.3http://docs.cloudfoundry.com/docs/running/architecture/stacks.html%20that%20ruby%201.9.3 is indeed available at runtime in the default stack. I guess then that this would be a matter of running a small ruby program (a module/part of the buildpack) in the start command prior to excuting the jvm that would re apply the memory heuristics. Does this seem feasible ?

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you.

— Reply to this email directly or view it on GitHub.

nebhale commented 11 years ago

@gberche-orange Just to let you know that we haven't forgotten about this, it's on our backlog but the push to PivotalCF release has de-prioritized it since you raised it. We're still committed to looking at it, but the introduction of memory bounding raises some issues around a reasonable implementation. When we've had a chance to fully dig into the issue, we'll let you know what we find.

glyn commented 10 years ago

I have prototyped running the memory calculation during droplet start and it works in the "golden path" cases (scaling memory up or down when there is enough memory) but produces extremely poor diagnostics in one error case (scaling down so that there is insufficient memory to cover the configured lower bounds) and has some new, worrying failure modes and fragility. We are currently considering our options.

Here's my latest brain dump on the subject in case @gberche-orange or anyone else would care to comment...

The only clean solution I am aware of is to have Cloud Foundry re-drive staging when memory is scaled up or down. This behaviour may need to be made conditional, e.g. by adding a setting to the hash returned by the release script, so that the Java buildpack could request to be re-driven when scaling occurs. It is not clear that the Cloud Foundry team would agree to provide such function as it would (currently) only be useful to the Java buildpack.

The current function (on master) is really unacceptable as scaling memory down too far will provoke warden to kill the application which currently produces very poor diagnostics. Although we hope the warden diagnostics will be improved, it is questionable whether they will ever be sufficient for this use case. Apart from poor diagnostics, a major downside of the current behaviour is that the application will not necessarily fail when it is scaled down too far, but may be killed by warden at an essentially arbitrary point in the life of the application.

The prototype on branch 56495606-graceful-memory-scaledown supports scaling memory up and down and causes the application to fail to start when memory is scaled down too far. If insufficient memory is available when the application is pushed, staging fails with suitable diagnostics much as on master.

The downsides of this implementation include:

staging_info.yml no longer includes the JVM memory settings (although we could write these to a file and include the path of that file in stagine_info.yml). Users are already having to trust the buildpack to get the JVM memory settings right and this implementation removes the settings one step further away from the user's gaze.
the start command returned by the release script now includes a backtick invocation of memcalc which means that:
- if this invocation fails for any unexpected reason, then the JVM will run with default memory settings
- if memcalc detects that $MEMORY_LIMIT is too small, it has to resort to returning "-Xmx0k" to cause the application to crash - this is retried by CF but the app ultimately crashes, which is acceptable, but with no accessible diagnostics, which is unacceptable.
a significant proportion of the buildpack Ruby code is copied into the droplet which (a) pre-req's a suitable Ruby runtime in the DEA (which is otherwise superfluous to a Java application), (b) adds to the set of environments in which the buildpack code must run, and (c) complicates staging if we want to avoid adding all the buildpack Ruby code to the droplet.

This prototype might become a little more attractive if Cloud Foundry gathered logs during application start and delivered them back to the user of (g)cf push.

gberche-orange commented 10 years ago

Thanks @glyn for this prototype!

I agree with you that it would be best to have the buildpack contract refined, such as the release command provide additional information on whether the release command supports scale up/down dynamically or requires a restaging. Let's open the conversation on the vcap-dev mailing list.

Other buildpacks might suffer from the same issue (php, python...) Some runtimes, in particular those having GC have support for limiting the max memory in order to provide clear diagnostics when memory is low.

nebhale commented 10 years ago

We've come to the end of the prototyping effort here and determined that improvements need to be made to Cloud Foundry core. For posterity the prototype can be found at 8353fe21f6b0df6090fdea62db8d10cc81add83e.

kelapure commented 10 years ago

@nebhale What are these improvements to core CF and when are they expected to be made ?

nebhale commented 10 years ago

@kelapure The improvements would be around better diagnostics when failures happen during startup of the a droplet, better diagnostics when applications are killed by Warden, and an enhanced buildpack API allowing either a second tier-of staging execution (that can occur without a "re-stage") or the ability for a buildpack to opt-into re-stages for certain lifecycle events.

To my knowledge there is no plan to make any of the changes described herein. Discussion about what changes might be made (not when) is ongoing in the vcap-dev list.

kelapure commented 10 years ago

Understood. Will the upcoming Diego project help ?

On Wed, Jan 22, 2014 at 6:12 AM, Ben Hale notifications@github.com wrote:

@kelapure https://github.com/kelapure The improvements would be around better diagnostics when failures happen during startup of the a droplet, better diagnostics when applications are killed by Warden, and an enhanced buildpack API allowing either a second tier-of staging execution (that can occur without a "re-stage") or the ability for a buildpack to opt-into re-stages for certain lifecycle events.

To my knowledge there is no plan to make any of the changes described herein. Discussion about what changes might be made (not when) is ongoing in the vcap-dev list.

— Reply to this email directly or view it on GitHubhttps://github.com/cloudfoundry/java-buildpack/issues/8#issuecomment-33012633 .

nebhale commented 10 years ago

Some of our pain points are addressed in the Diego presentation, although sometimes making them worse. My understanding is that the staging changes in Diego are quite soft at the moment and need lots of input. I'm local to the Diego team next week and will talk about what improvements can be made with them then.

kelapure commented 10 years ago

Thank you any guidance to buildpack developers and implications for buildpack particularly around staging will be appreciated!

On Wed, Jan 22, 2014 at 10:16 AM, Ben Hale notifications@github.com wrote:

Some of our pain points are addressed in the Diego presentation, although sometimes making them worse. My understanding is that the staging changes in Diego are quite soft at the moment and need lots of input. I'm local to the Diego team next week and will talk about what improvements can be made with them then.

— Reply to this email directly or view it on GitHubhttps://github.com/cloudfoundry/java-buildpack/issues/8#issuecomment-33031982 .

gberche-orange commented 10 years ago

@nebhale do you think the work on restaging has addressed the issue, i.e. should the new restage endpoint be called offered to be called upon a scale up/down by the cli ?

https://www.pivotaltracker.com/s/projects/966314/stories/65335102 https://groups.google.com/a/cloudfoundry.org/d/msg/vcap-dev/Oq-7XXwT7fs/0rq096cUWacJ

nebhale commented 10 years ago

To be honest, I don't have a lot of hands on experience with the new restage behavior so I can't say for sure. Restaging should be sufficient to handle changes to the memory allocated to an application, so I'd tentatively say yes, this issue should be addressed via restage.

youngm commented 9 years ago

@nebhale We're implementing some auto-scale behaviour and ran into this restage issue again. It is one thing to know you must restage when scaling a java app via the CLI. But when a user is setting up auto scale rules, asking them to restage or restart depending on the buildpack used is yet another annoying side effect of staging time memory heuristics.

I've never been a fan of this behaviour and the complexities it propagates to the user. Auto scale rules is yet another example of this complexity pushed to the user. I hope that you guys haven't given up completely on a startup time memory heuristics solution sometime in the future.

youngm commented 9 years ago

@nebhale I read through @glyn 's explanation for why this doesn't work. I don't recall, did we have a functional loggregator when this was last attempted? I don't see why memcalc cannot simply log to stdout that there is not enough memory to run the app then give the jvm 0 heap and let it fail?

Seems to me this makes the java buildpack no different from any other buildpack. If I scale node down too small it will fail to start and I must look at loggregator to see why.

Thoughts?

nebhale commented 9 years ago

@youngm So if we take a look at the automatic memory-sizing feature we see that it's very existence is an artifact of the JVM. The reason that the Java Buildpack (err, JVM buildpack) is different than all the rest is that it's the only setup where the process doesn't use all allocated memory by default. If you look at Ruby, Node, etc. the process will happily use as much memory as it can get a hold of. The JVM on the other hand will only use its default (64M I think?), unless otherwise specified.

So, if we assume that a calculation must be made to ensure that people paying for a 2G instance have a JVM that uses 2G, the question is when do we calculate it. The inputs for that calculation (i.e. $MEMORY_LIMIT) are available both at staging and startup so both are options. Staging was deemed to be the best choice because the code to calculate the memory configuration is sizable and because we cannot depend on Ruby being available at runtime unless we add it to the droplet.

The memory calculation consists of 7 files with 443 SLOC. That's a sizable amount of Ruby to be putting into Java/JVM application's startup routine. Certainly any concerns about errors being reported are lessened these days, although the failure mode where staging results in Down, Down, Down, followed by cf logs isn't as nice as a directly printed failure at staging and a refusal to start.

The stronger argument against moving the functionality relates to the Ruby runtime. At the moment, there is no guarantee (by which I mean contractual documentation) that a Ruby runtime is available to all applications at runtime. Even if we accept that there is in practice, it's problematic to use it because we have no idea what version it is, and what the upgrade cycle for that version is, and whether it will be around tomorrow. So that leads us into the situation of adding a Ruby to a Java/JVM droplet just so that we can make the memory calculation. At the moment the Ruby runtime is running at 32.1M (for comparison the JVM is running at 39.2) so that seems like a pretty hard pill to swallow.

I'm open to arguments on this, and I totally understand your complaints about knowing whether to restart or restage (note that we also require restage for service changes which other buildpacks don't necessarily need). Keep in mind that the JVM and the functionality of the Java Buildpack are both quite different from the other runtimes and buildpacks which is what leads to these differences in behaviors.

glyn commented 9 years ago

One option would be to write a Go program to do the calculation at runtime and include that program in the droplet. Then there would be no dependencies to worry about, unlike with Ruby. The inputs to the program could be written into a file by the buildpack and then read by the Go program at runtime. (This is equivalent to the hair-brained scheme we discussed months ago of including a shell script in the droplet to do the calculation and which we rejected this because of the lack of precision in shell arithmetic. Thankfully Go has suitable arithmetic support.)

youngm commented 9 years ago

@nebhale Thanks for being willing to open the topic up again. I understand how the JVM makes this difficult. However, I think wherever possible if we can make the java buildpack function similar to the other buildpacks it would be a win for the platform.

I've been doing some thinking and combined with @glyn 's idea I'd like to propose a candidate solution to discuss.

Continue to do memory heuristics calculations during staging. If we can find out a memory setting is going to fail during staging this would allow us to tell the user sooner and more directly. Plus other calculations and validations would still need to happen at staging anyway, like ensuring illegal jvm args are not used.
Create a Go or other small native binary application to do the memory heuristics calculation.
It appears that loggregator app startup loging was improved sometime after this spike was performed previously: https://groups.google.com/a/cloudfoundry.org/d/msg/vcap-dev/bgCvU04rrwc/2R0c8hZYXRUJ
The memory heuristics app should print the detected memory settings to loggregator if successful.
If not enough memory is detected then set the heap to 0, or find some other consistent way to crash the jvm, and print a bunch of diagnostics to the console.
It may be necessary until diego is common to have the memory app sleep for a second or 2 on failure to ensure that the dea logging agent is attached and will get the failure report. https://groups.google.com/a/cloudfoundry.org/d/msg/vcap-dev/2acNlom2MHc/PSERnltc-rkJ

I think with what is proposed above we:

No longer have the ruby on dea agents issue
Apps will know at staging time if they set an invalid memory setting
Apps will be able to scale memory up and down without restaging.
If memory is set too low then the app will fail on restart just like all the other buildpacks today, that said the point of scale down failure will be much higher than most other platforms.

Thoughts?

glyn commented 9 years ago

I think that solution might work, although it would depend very much on whether diagnostics are captured properly during application startup. I don't particularly like the idea of the memory heuristics application having to sleep on failure because that seems flaky. How long would it need to sleep to guarantee that the logs get reported? This seems like an unpredictable value.

BTW if this is implemented, it might make sense also to call the memory heuristics application at staging time rather than maintain the existing duplicate Ruby code too.

On Tue, Jan 6, 2015 at 5:35 PM, Mike Youngstrom notifications@github.com wrote:

@nebhale https://github.com/nebhale Thanks for being willing to open the topic up again. I understand how the JVM makes this difficult. However, I think wherever possible if we can make the java buildpack function similar to the other buildpacks it would be a win for the platform.

I've been doing some thinking and combined with @glyn https://github.com/glyn 's idea I'd like to propose a candidate solution to discuss.

-

Continue to do memory heuristics calculations during staging. If we can find it a memory setting is going to fail during runtime this would allow us to tell the user more directly. Plus other calculations and validations would still need to happen at staging anyway, like ensuring illegal jvm args are not used.

Create a Go or other small native binary application to do the memory heuristics calculation.

It appears that loggregator app startup loging was improved sometime after this spike was performed previously: https://groups.google.com/a/cloudfoundry.org/d/msg/vcap-dev/bgCvU04rrwc/2R0c8hZYXRUJ

The memory heuristics app should print the detected memory settings to loggregator if successful.

If not enough memory is detected then set the heap to 0 or find some other consistent way to crash the jvm and print a bunch of diagnostics and reasons for the failure.

It may be necessary until diego is common to have the memory app sleep for a second or 2 on failure to ensure that the dea logging agent is attached and will get the failure report. https://groups.google.com/a/cloudfoundry.org/d/msg/vcap-dev/2acNlom2MHc/PSERnltc-rkJ

I think with what is proposed above we:

No longer have the ruby on dea agents issue

Apps will know at staging time if they set an invalid memory setting

Apps will be able to scale memory up and down without restaging.

If memory is set too low then the app will fail on restart just like all the other buildpacks today, that said the point of scale down failure will be much higher than most other platforms.

Thoughts?

— Reply to this email directly or view it on GitHub https://github.com/cloudfoundry/java-buildpack/issues/8#issuecomment-68900123 .

Regards, Glyn

youngm commented 9 years ago

@glyn The sleep time is unfortunate but is a known issue and should be fixed in diego. From my experience 1 second has always worked for me but some testing will be necessary. Even if we set the delay to 5 seconds I don't think having to wait 5 seconds before loggregator gives you a good failure message would be too egregious and should be a reason to give up on runtime memory scaling.

I agree about using the same binary during staging and running.

cloudfoundry / java-buildpack

java-buildpack incompatible with the use of cf scale -MEMORY cmd #8

Create a Go or other small native binary application to do the memory heuristics calculation.

It appears that loggregator app startup loging was improved sometime after this spike was performed previously: https://groups.google.com/a/cloudfoundry.org/d/msg/vcap-dev/bgCvU04rrwc/2R0c8hZYXRUJ

The memory heuristics app should print the detected memory settings to loggregator if successful.

If not enough memory is detected then set the heap to 0 or find some other consistent way to crash the jvm and print a bunch of diagnostics and reasons for the failure.