Open ggalmazor opened 5 years ago
So, I'll start the discussion :)
I'll comment on the ideas I have some opinion I want to share. The rest are fine by me.
k8s, managed db vs PostgreSQL container
I'm worried that the managed db will depend much on the provider users choose for their deployment. @brettneese, you have experience in k8s. Do you think we could offer instructions that apply universally across providers?
In any case, we can totally start with a managed db solution and then add more options later.
Publish Docker image in Docker hub
I think we're almost ready to do it. There's an opendatakit
account and I think we would just need to hook everything up. It would be best to wait until @brettneese can review the image file structure and do some sanity checks, and then @yanokwa can share (with me?) the Docker Hub credentials so that I can configure the autobuild feature.
Wrap Gradle commands in Makefile
This is a bit controversial. We've invested a lot of effort in improving the tooling around Aggregate, and having Gradle as the one-stop source to deal with the project's build workflow has made the project much more approachable for new contributors (we put a high value on that). I agree that there are other tools that are better suited for some parts of the build workflow, but that's a tradeoff we're willing to pay so far.
I'm not against of having a Makefile as a wrapper for pre-defined Gradle tasks, but I'd rather start by having better documentation of the build workflow, with more examples, etc.
Ensure PostgreSQL data folder belongs to host user, not container user
I don't know if this one is really an issue. I've put it because I know the Docker Compose setup we use for development (at /db
) has this issue. Once you spin up the containers, a pgdata
folder owned by root
is created, which makes it a bit of a nuissance.
Not super important for a development environment, but I'd figure that final users would find this super annoying.
Hey @ggalmazor, thanks for getting this going!
It's true that the instructions will vary widely across providers. I tend to strongly believe in managed DBs, particularly for a project like this, so I'd suggest just providing links to the provider's help docs on "how to set up a postgresql." The configuration will be the same - it's how you get the DB that's different. We can work on automating this a bit, maybe, but I don't see a whole lot of value there.
Similarly, how you get yourself a Kubernetes in the first place can vary widely across providers. GKS, EKS, and AKS are all a bit unique in their own way and you always have the option of spinning a Kubernetes up for yourself on bare instances. Again, we should provide recommendations, but I suggest pointing people to first-party help docs (or blog posts, such as my series.)
I don't see any issue with the current Docker file, now that my entrypoint tweaks are in. As I recall it, it's much lighter than the Docker image I'm currently using in production.
My only remaining thought with the Docker build process is that it's not currently tagging itself as aggregate:latest, which would be a lot simpler than copying around the weird tag that is autogenerated. It probably should tag itself as both. This would also make it easier to keep the docs up to date as they would apply whether someone pulled the image from Docker Hub or built it locally.
We can ignore this for now. It mostly came out of my annoyance with typing:
./gradlew clean dockerBuild -xtest -PwarMode=complete
And not really knowing what that did.
That being said, my background isn't Java, so that tooling is probably already familiar with your base (in the same way that npm
is familiar to me, you're lucky I didn't suggest "add a package.json" which is my default plan of attack for build commands ;-))
Absolutely agree that a lot of this pain would also be solved with better docs (although I personally believe that if you have to write docs, it's too late.) This could also be baked into a CLI as well, though - the only advantage of a Makefile is that it essentially creates a CLI in a standard way.
We can double check this once we add native DB support. That being said, as you've already mentioned, permanent volumes in container environments is a bit tricky and I'm going to suggest that we start with suggesting users use a managed DB and leave the DB in container trickiness to advanced users (it's def possible! but most users are much better off using a hosted DB.)
Thanks for your comments. I think we're in agreement overall. Some of the stuff we're talking about will have to be addressed further down the road but I think we're composing a fine list of actionable tasks nevertheless.
My only remaining thought with the Docker build process is that it's not currently tagging itself as aggregate:latest, which would be a lot simpler than copying around the weird tag that is autogenerated. It probably should tag itself as both. This would also make it easier to keep the docs up to date as they would apply whether someone pulled the image from Docker Hub or built it locally.
Thanks! That makes sense and I've added to the main list.
I'm not sure if this is still an active concern, but I recently tried automating docker builds of aggregate for one of my own projects. I ended up creating a github action to build and deploy, although I think it probably can also be done directly through docker hub / github integration with an additional build hook.
The action can be found here: https://github.com/chrismclarke/aggregate/blob/gh-actions/docker-build-deploy/.github/workflows/docker-build-deploy.yml
The corresponding docker file is published here: https://hub.docker.com/repository/docker/chrismclarke/odkaggregate
I'm also planning to add a separate build for use with docker-compose
(using the docker-compose.gradle
build), although I'm still not sure the best way to tag these things.
In any case, let me know if it's of any use to you.
(update - realised docker-compose works fine with existing build, I just needed to pass a DB_HOST
environment variable)
Thanks for the update, @chrismclarke! One thing you may want to do is share your actions and docker file in a thread at https://forum.opendatakit.org/c/development/5 so it's a little more visible for folks who may benefit from it in the short term.
This is a follow-up issue to @brettneese's recent work. We can use this issue's description as a backlog of things we agree on doing.
Ready-ish:
/docs
describing how to deploy in k8s using a managed dblatest
Ideas:
Let's discuss each point. We can cross things we won't be doing or add new things.