GoogleCloudPlatform / appengine-java-vm-runtime

Apache License 2.0
67 stars 34 forks source link

Remove dependence on Maven to build the openjdk8 image #314

Closed meltsufin closed 8 years ago

meltsufin commented 8 years ago

@aslo

gregw commented 8 years ago

I'm still dubious about moving away from maven. If anything, I think we should perhaps embrace maven more and start storing the generated images as durable maven artefacts rather than just being stored in ephemeral docker repositories. I can imagine that to support apps that may run for years/decades it may be valuable to always be able to retrieve past images.

The ability to parametrize the Dockerfile is also a valuable aspect of using maven that needs an alternative.

aslo commented 8 years ago

@gregw Are docker repositories really more ephemeral than maven central? I was under the impression that we would be able to store images on gcr.io indefinitely.

Regarging parameterized builds, the same could be accomplished using bash - I would argue that maven might be overkill if that's all we're using it for.

meltsufin commented 8 years ago

As much as I'm a fan of Maven myself, I'm weary of using it to build non-Java code. I'm also not sure if uploading 500MB and larger images to Maven Central is an acceptable practice. It kind of loses the whole storage efficiency of Docker anyway.

If the ephemeral nature of the GCR is an actual issue, it's a problem that would have to be solved for all languages anyway, and I think it would make even less sense for other language runtimes to be stored in Maven Central.

I definitely see your point about parametrizing the Dockerfile, but like Alex mentioned, there might be other ways of doing so. However, we should keep in mind that whatever we choose should run on other operating systems. So, bash may not necessarily be the solution.

In any case, we're just exploring this at the moment. As far as the OpenJDK8 image is concerned, currently I don't see much value with using Maven, but maybe it would make more sense for building the Jetty image. We'll explore that and see what makes sense.

ludoch commented 8 years ago

1/ Regarding Maven, one cannot change the artifact ID after a push, it is strong immutable. For a Docker register, owners (including me) can retag images. We do that when we push a new latest default image, or for ongoing tags like STAGE ot githubheadasync.

2/ I still prefer the highly portable way of building images via Maven:

3/ the env variables were introduced because of customer issue and support needs. I would highly recommend we keep them for support sanity. If there is a unified better way to produce this information, I am all for it, but this new way should be added at the same PR than removing the old way.

One can ssh via the admin console to the GCE VM and execute this to see the env variables:

sudo docker exec -i -t gaeapp /bin/bash
#now can you run all containers commands including env
env

APPENGINE_LOADBALANCER=
MEMCACHE_PORT_11211_TCP_PROTO=tcp
HOSTNAME=0be4bfeb47d9
TMPDIR=/tmp/jetty
GAE_IMAGE_VERSION=1.9.40-SNAPSHOT
MEMCACHE_NAME=/gaeapp/memcache
GAE_MODULE_NAME=default
MEMCACHE_PORT_11211_TCP_ADDR=172.17.0.3
GAE_AFFINITY=true
GAE_LONG_APP_ID=jetty9-work
JETTY_BASE=/var/lib/jetty
GAE_IMAGE_LABEL=async-1.9.40-SNAPSHOT
GAE_MODULE_VERSION=javanew7
GAE_MODULE_INSTANCE=0
MEMCACHE_PORT_11211_TCP_PORT=11211
APPENGINE_LOADBALANCER_IP=
PATH=/usr/local/jetty/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
GAE_MINOR_VERSION=394553564976877078
RUNTIME_DIR=/var/lib/jetty
PWD=/var/lib/jetty/webapps/root
GAE_IMAGE_NAME=jetty9-compat
MEMCACHE_PORT=tcp://172.17.0.3:11211

will display all the env variables this docker image is exposing.

I suggest we spend more time in trying to understand the support requirements and come up with a unified plan. Meanwhile, please keep what is working instead of removing it, otherwise the support organization and the customers will have less debugging tools to analyze what is running.

meltsufin commented 8 years ago

@ludoch We certainly don't want to break anything by removing Maven. Do you have a reference to more details about the customer issue that prompted the introduction of the environment variables based on the maven artifact?

jmcc0nn3ll commented 8 years ago

Maven is also a known entity that the vast majority of developers that have touched java have at least some exposure to, and with any exposure at all you have the basic ability to build the software. It would be unfortunate to lose both build portability and simplicity to the user.

I am not a huge fan of using Maven to build things like C++ but it works solidly for assembling stuff together. Ultimately, I am a fan of any approach that follows Convention over Configuration.

To steal the tagline of the Mutt mail client, 'All build systems suck, Maven (and Gradle I suppose) just suck less.'

cheers, Jesse

jesse mcconnell jesse.mcconnell@gmail.com

On Fri, Aug 5, 2016 at 9:41 AM, Ludovic Champenois <notifications@github.com

wrote:

1/ Regarding Maven, one cannot change the artifact ID after a push, it is strong immutable. For a Docker register, owners (including me) can retag images. We do that when we push a new latest default image, or for ongoing tags like STAGE ot githubheadasync.

2/ I still prefer the highly portable way of building images via Maven:

  • Maven is a requirement for Java anyway
  • Maven works on Windows.
  • Instructions are dead simple with 1 single command.

3/ the env variables were introduced because of customer issue and support needs. I would highly recommend we keep them for support sanity. If there is a unified better way to produce this information, I am all for it, but this new way should be added at the same PR than removing the old way.

One can ssh via the admin console to the GCE VM and execute this to see the env variables:

sudo docker exec -i -t gaeapp /bin/bash

now can you run all containers commands including env

env

APPENGINE_LOADBALANCER= MEMCACHE_PORT_11211_TCP_PROTO=tcp HOSTNAME=0be4bfeb47d9 TMPDIR=/tmp/jetty GAE_IMAGE_VERSION=1.9.40-SNAPSHOT MEMCACHE_NAME=/gaeapp/memcache GAE_MODULE_NAME=default MEMCACHE_PORT_11211_TCP_ADDR=172.17.0.3 GAE_AFFINITY=true GAE_LONG_APP_ID=jetty9-work JETTY_BASE=/var/lib/jetty GAE_IMAGE_LABEL=async-1.9.40-SNAPSHOT GAE_MODULE_VERSION=javanew7 GAE_MODULE_INSTANCE=0 MEMCACHE_PORT_11211_TCP_PORT=11211 APPENGINE_LOADBALANCER_IP= PATH=/usr/local/jetty/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin GAE_MINOR_VERSION=394553564976877078 RUNTIME_DIR=/var/lib/jetty PWD=/var/lib/jetty/webapps/root GAE_IMAGE_NAME=jetty9-compat MEMCACHE_PORT=tcp://172.17.0.3:11211

will display all the env variables this docker image is exposing.

I suggest we spend more time in trying to understand the support requirements and come up with a unified plan. Meanwhile, please keep what is working instead of removing it, otherwise the support organization and the customers will have less debugging tools to analyze what is running.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/GoogleCloudPlatform/appengine-java-vm-runtime/pull/314#issuecomment-237868767, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQrr-4Sgs6au6ip3j9NqaC3wH6S0hLpks5qc0uhgaJpZM4JdGcZ .

ludoch commented 8 years ago

@meltsufin: working on 2 supported branches with 3 runtimes each, means 6 runtimes. Plus 3= engineers doing tests (me, greg, jane,...) on the same cloud project leads to a lot of pain when trying to determine which app/version/module/service is there in GCE, so the intial customer is us. I also gave the instructions above to a few customers that are actively trying to deploy on vm:true to understand which exact image they were customizing, to get a clear status of what is running instead of doing a math mapping of a Dockerfile content, a deployment date (which is not really easy to track when you see a GCE VM list in the admin console). I think the OnCall Flex engineers are also relying on the output of env to investigate initial customer escalations. All I am for is that the Silver runtime team starts spending time with SREs, Support org, Flex on call rotation and determine the best way to introduce metadata in the images that can be used as is or customized to reduce the trouble Greg, Jane and I got into initially, and also give support the information they need to trouble shoot issues.

meltsufin commented 8 years ago

@ludoch I think it would be great if we could get the needed image information for debugging using docker inspect, but I agree that the environment variables are more user-friendly.

I think we can hard-code the GAE_IMAGE_NAME to openjdk here.

The GAE_IMAGE_LABEL will have to become a plural, and I'm not sure how useful it is, since some tags will be reused for different image releases.

We can use semver for GAE_IMAGE_VERSION.

Also, I think it makes sense to rename the prefix to GCP_.

gregw commented 8 years ago

Note also that currently this openjdk image does not have any java code, but the apphosting-logging module should really be a part of it as it is not jetty specific. So mvn will be needed to compile and release that.

meltsufin commented 8 years ago

@gregw we already have the logging module with a pom under the openjdk "repo". Is this what you're referring to?

gregw commented 8 years ago

@meltsufin yes and no. Yes there is a pom.xml in the appengine-java-logging, but you are proposing removing the top level pom.xml. This in order to build the openjdk-runtime, you will have to cd to each of the sub directories and build them using whatever build tool they each use.

I don't think this is a nice way to build anything!

So while I do think that it is worthwhile to have the discussion if mvn is the best tool to build everything and if there are some images that can be directly build with docker, I think that is a separate question to how repos are restructured. I'd really like to see the restructure take place with the existing build mechanisms and release structures. Once the separate repos exist, we can then consider changing build tools etc.

meltsufin commented 8 years ago

@gregw Currently, the openjdk image does not depend on the logging module. So, it makes sense to have the ability to release logging without releasing the openjdk image. In general, we might want to have the ability to release the support modules on their own schedule independent of the images.

Note that even if we continue to use Maven to build everything, we still need to do something with the very top-level pom.xml because we'll no longer have everything in one repo. What do you think is the best way to handle that?

gregw commented 8 years ago

I'm not a big fan of multiple release versions under the one repository. With Git, tags apply to the entire repo, so the entire repo should be released in a single cycle - multiple artifacts are OK, but they should all be released together. If the logging is to be released separately to the openjdk image, then it should be in a separate repository.

Perhaps we should have a entirely different structure - a single java repository that has sub modules for the all the different levels and all the base-modules, but does not generate any images. Then we can have separate repos for each set of images that we wish to build that will consume the java artefacts that are separately built.

Also, once we split into separate repositories, there will of course not be a single top level build. But I think it is very standard to be able to checkout a repo, go to the top level and then build the entire contents. Everything in the same repo should be able to be built together and released together.

aslo commented 8 years ago

I agree, we shouldn't have multiple separate releases in a single repository. Here's a suggestion - why not just separate the logging module into an additional separate repository?

We can use maven to release a single artifact from that repository and it will necessarily be completely decoupled from any other release cycle.

meltsufin commented 8 years ago

I also like having each module that has it's own release schedule in a separate repo. However, when the modules are tiny like the logging one, I'm not sure it's worth the additional overhead of maintaining a separate GitHub project for each. It may get especially unwieldy if the number of these modules grows significantly.

@gregw I think it's worth considering a separate repo for Java code that helps integrate with GCP. We could put modules like logging into it and call it "java-runtimes-support"? We would still have to decide whether to release all of the modules in that repo together. Also, we probably would not want to put Jetty image-specific integration Java code in there. What do you think?

aslo commented 8 years ago

@meltsufin What overhead are you concerned about? The same amount of code is being maintained, but in a more modular, decomposed way. (Granted, given our constraints, the initial repository creation is a bit annoying, but that takes place only once).

In my opinion, the clarity that we'd get from decomposing this code into a one-module-per-repo setup would be worth some additional setup cost.

gregw commented 8 years ago

@meltsufin I think it probably is the ultimate decomposition to have all the java code in it's own set of repositories with it's own build mechanism, kept separate from any building of images for flex and/or gke etc.

The java repos/builds would be split up by release cycles and dependencies and would provide a suite of artefacts in maven central that can be consumed by the docker builds.

It means that for every change to a java container, (say a new jetty version) we will have to build/release the java project and then build/release the docker images - so that's 2 steps. But it also means that if we just change something in the image scripts/layout or other dependencies, we can just build the images without doing a fake release of the java container.

Note that we were kind of going in that direction with our xxx-base modules that build the java and the xxx modules that built the images.

No matter what we do, we wont have a single step to build everything, but CI can help with that.

meltsufin commented 8 years ago

Per the meeting, I'm closing this in favor of making minimal changes as part of the restructuring.