tomcbe commented 2 years ago

JIRA Ticket: https://fedora-repository.atlassian.net/browse/FCREPO-3746, https://fedora-repository.atlassian.net/browse/FCREPO-3747

This PR is based on the PR from @mikejritter: https://github.com/fcrepo-exts/fcrepo-camel-toolbox/pull/163

What does this Pull Request do?

This PR

adds improvements to the Dockerfile and the docker-compose.yml and
configures Github Actions pipeline to build and push docker images

What's new?

Dockerfile improvments:

Remove hardcoded path to configuration file, which gives users the ability to decide where they wan't to load the configuration from
Add possibility to pass arguments to the JVM via Environment Variable "JAVA_OPTIONS"
Only build with maven inside docker, if the artifacts don't exist already in the build context

docker-compose improvements:

Use fcrepo as collection name in Solr and database name in Fuseki

Github Actions Pipeline improvements:

Add pipeline instructions to build and push the docker multiarchitecture images with every build on branch 'main'

How should this be tested?

To test locally:

Run the maven build locally:

mvn clean package

Build the docker image:

docker buildx create --use
docker buildx build --platform=linux/amd64 --load --tag="fcrepo/fcrepo-camel-toolbox"  .

Start fcrepo, solr, fuseki and the camel-toolbox in docker containers with docker-compose:

cd docker-compose
docker-compose up

Open a separate terminal, run the following command to easily follow the logs of the camel-toolbox:

docker logs -f docker-compose_camel-toolbox_1

Create some resources in fedora (accessible on http://localhost:8080/fcrepo/rest)

Query Solr:

curl 'http://localhost:8983/solr/fcrepo/select?_=1633362335425&q=*:*&q.op=OR' -H 'Accept: application/json, text/plain, */*'

--> You should see the newly created resources in the response.

Query Fuseki:

curl 'http://localhost:3030/fcrepo/query' -X POST -H 'Accept: application/sparql-results+json,*/*;q=0.9' --compressed -H 'Content-Type: application/x-www-form-urlencoded; charset=UTF-8' --data-raw 'query=%0A%0ASELECT+%3Fsubject+%3Fpredicate+%3Fobject%0AWHERE+%7B%0A++%3Fsubject+%3Fpredicate+%3Fobject%0A%7D%0ALIMIT+25'

--> You should see the newly created resources in the response.

Interested parties

@mikejritter @dbernstein @fcrepo/committers

tomcbe commented 2 years ago

@andyundso Regarding your question for building inside/outside of docker: When a release is done, the artifacts are built and signed with gpg in the deploy job of the Github actions pipeline. It makes sense to use these artifacts in the docker container as well.

What we could do instead: Download the artifacts from the maven central repository instead of copying them into the container. This would allow to build the docker container without having a working maven toolchain.

@mikejritter @dbernstein What are your thoughts on this?

tomcbe commented 2 years ago

Links to download the artifacts:

Snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/fcrepo/camel/fcrepo-camel-toolbox-app/ Releases: https://repo1.maven.org/maven2/org/fcrepo/camel/fcrepo-camel-toolbox-app/ (not yet working, as it has never been released)

mikejritter commented 2 years ago

@andyundso Regarding your question for building inside/outside of docker: When a release is done, the artifacts are built and signed with gpg in the deploy job of the Github actions pipeline. It makes sense to use these artifacts in the docker container as well.

What we could do instead: Download the artifacts from the maven central repository instead of copying them into the container. This would allow to build the docker container without having a working maven toolchain.

@mikejritter @dbernstein What are your thoughts on this?

Something I realized I hadn't mentioned yet in the discussion about building in/out of docker: initially I put the build stage inside docker because one of the integration tests was failing on my machine and I wanted to test if it would inside docker as well. Since it was able to build without issue inside I left it in the Dockerfile. I hadn't really considered anyone not having maven or a jdk installation available.

tomcbe commented 2 years ago

@andyundso Regarding your question for building inside/outside of docker: When a release is done, the artifacts are built and signed with gpg in the deploy job of the Github actions pipeline. It makes sense to use these artifacts in the docker container as well. What we could do instead: Download the artifacts from the maven central repository instead of copying them into the container. This would allow to build the docker container without having a working maven toolchain. @mikejritter @dbernstein What are your thoughts on this?

Something I realized I hadn't mentioned yet in the discussion about building in/out of docker: initially I put the build stage inside docker because one of the integration tests was failing on my machine and I wanted to test if it would inside docker as well. Since it was able to build without issue inside I left it in the Dockerfile. I hadn't really considered anyone not having maven or a jdk installation available.

@mikejritter I actually found a way to build in docker when necessary, but otherwise reuse the existing artifacts.

tomcbe commented 2 years ago

Hi @dbernstein

I switched the Github Actions pipeline to use buildx and enabled the platform linux/amd64 and linux/arm64.

If you want to test the build locally:

Set environment variables:

DOCKER_PLATFORMS: linux/amd64
FCREPO_CAMEL_TOOLBOX_VERSION=$(mvn org.apache.maven.plugins:maven-help-plugin:3.2.0:evaluate -Dexpression=project.version -q -DforceStdout)

Run the following commands to build with docker buildx:
```
docker buildx create --use
docker buildx build --platform=${DOCKER_PLATFORMS} --load --tag="fcrepo/fcrepo-camel-toolbox" --tag="fcrepo/fcrepo-camel-toolbox:${FCREPO_CAMEL_TOOLBOX_VERSION}" .
```
Note: You can copy the docker buildx build command from the Github Actions pipeline definition, but replace the flag --push with --load (otherwise the images will directly be pushed to Docker Hub. The flag --load only supports one platform at the time, so I adapted the value for DOCKER_PLATFORMS to linux/amd64 in these instructions.

tomcbe commented 2 years ago

Hi @dbernstein

I finished my work on this PR and updated the PR description with the correct steps to test this locally.

dbernstein commented 2 years ago

@tomcbe and @mikejritter : I'm not loving having the environmental variables use a naming scheme different from what is in the documentation - I think that could be confusing for users. This I think is key.

Having the configuration in a property file and passing that file to the application seems to me the cleanest way of configuring the app. The way @mikejritter had the DockerFile setup, the camel toolbox will come up with the defaults that are set in the application (@mikejritter correct me if I am wrong). So we can launch docker separately without a problem. Then with compose, we can simply override the default values in the file and even add any config that isn't currently in there (such as error.maxRedeliveries). We don't have to duplicate the entire set of configuration options in docker-compose.yml in order to expose all of our toolbox properties.

It is entirely possible that I've missed something important here @tomcbe. If that is the case, help me understand. Even if I have understood, still would like to better understand the rationale for configuring composing using the environment construct.

tomcbe commented 2 years ago

@tomcbe and @mikejritter : I'm not loving having the environmental variables use a naming scheme different from what is in the documentation - I think that could be confusing for users. This I think is key.

Having the configuration in a property file and passing that file to the application seems to me the cleanest way of configuring the app. The way @mikejritter had the DockerFile setup, the camel toolbox will come up with the defaults that are set in the application (@mikejritter correct me if I am wrong). So we can launch docker separately without a problem. Then with compose, we can simply override the default values in the file and even add any config that isn't currently in there (such as error.maxRedeliveries). We don't have to duplicate the entire set of configuration options in docker-compose.yml in order to expose all of our toolbox properties.

It is entirely possible that I've missed something important here @tomcbe. If that is the case, help me understand. Even if I have understood, still would like to better understand the rationale for configuring composing using the environment construct.

@dbernstein: I view the docker-compose.yml more as an example to help people get started. In the end, they would need to adapt it to their needs anyway. So I'm not against using a configuration file which is mounted into the container to provide configuration.

Configuring docker container via Environment Variables makes deployment to a server more straightforward: When you wan't to use a configuration file like @mikejritter did, I would need to have that file present on the target machine. So you would need some other tool (like e.g. Ansible) to get the config file there. But whatever we use in our docker-compose.yml, people can still decide to use environment variables to configure the camel-toolbox if they prefer that solution. So as already said, I'm happy to go with either solution.

If someone is using Docker Stack/Swarm or Kubernetes they would get other possibilities to make config files available to containers (like Docker Configs/Secrets in the case of Docker Stack/Swarm or ConfigMaps/Secrets in the case of Kuberenetes).

tomcbe commented 2 years ago

@dbernstein I implemented the changes as we just discussed.

fcrepo-exts / fcrepo-camel-toolbox

FCREPO-3746: Docker improvements #169

What does this Pull Request do?

What's new?

How should this be tested?

Interested parties