cholla-hydro / cholla

A GPU-based hydro code
https://github.com/cholla-hydro/cholla/wiki
MIT License
66 stars 32 forks source link

new build-tests.yml for Docker container #200

Closed alwinm closed 1 year ago

alwinm commented 1 year ago

I set up this pull request by cherry-picking the dev commit, so down the road if/when dev gets merged into main, this won't generate a merge conflict!

I set up the build script to look for the docker containers:

docker://alwinm/cholla:hip_github docker://alwinm/cholla:cuda_github

When we create new versions, we can give them tags with other names, and then alias the tagname to hip_github and cuda_github. This way, we can update the Docker container while keeping a version history and easily being able to switch back to previous versions if necessary, all without requiring pull requests on github.

I can set maintainers as collaborators on the Docker Hub repo so any of us can push images!

Example:

Setting the github container version to 0.0, assuming you've already pushed 0.0.

docker tag alwinm/cholla:hip_v0.0 alwinm/cholla:hip_github
docker push alwinm/cholla:hip_github

Setting the github container version to 0.1, assuming you've already pushed 0.1.

docker tag alwinm/cholla:hip_v0.1 alwinm/cholla:hip_github
docker push alwinm/cholla:hip_github
bcaddy commented 1 year ago

Triggering a workflow to rebuild the containers would be easy to do with labels as described here

evaneschneider commented 1 year ago

@bcaddy - does the documentation on the wiki seem sufficient, or are you suggesting that we need more?

bcaddy commented 1 year ago

I didn't see that, sorry. It looks good to me.

I still think that changing the container that the builds use should require a PR

alwinm commented 1 year ago

Currently the documentation I have so far is here: https://github.com/cholla-hydro/cholla/wiki/Docker

It takes 1 command to build a Docker image, 1 command to run it (I always do this to make sure Cholla compiles on it before pushing) 1 command to tag it (naming it whatever you want), and 1 command to push it. We would need to automate the step that verifies that Cholla will compile.

I could definitely do a short tutorial for the group on Docker.

I have some example Dockerfiles here: https://github.com/alwinm/cholladocker

evaneschneider commented 1 year ago

Doesn't changing the docker container require a PR by default? Since it involves changing the version number in a file?

bcaddy commented 1 year ago

If my current understanding is correct then if doesn't since the Cholla workflow doesn't have a version number. Rather someone builds a new container and then aliases it to the name that the workflow file is looking for.

alwinm commented 1 year ago

Working under the assumption that you preferred fewer PRs required, I actually stripped out version numbers in build-tests.yml. Someone who changes the Docker container would have to change version numbers in that Dockerfile, but currently there are no Dockerfiles being tracked in the cholla git repo. And any collaborators of the alwinm/cholla Docker hub repo can push / change the container.

To make PRs required again, I'd just put back version numbers into build-tests.yml and we wouldn't do the aliasing thing to point hip_github and cuda_github to the latest versions.

bcaddy commented 1 year ago

My suggestIon is that the docker file be tracked and a GHA is written to rebuild the container when needed and that we don't test the container in that GHA since the builds are run on every PR so the testing will happen immediately after always.

So the workflow looks like:

  1. Contributor forks and branches
  2. Modifies dockerfile and build-tests.yml
  3. Makes a PR with an appopriate label to tell github to launch the docker rebuild
  4. Once it's done we either manually or automatically run the cholla build to test the containers

It might be easier to have a section of build-tests.yml that checks if the container exists and if not it starts the build and waits for it to complete

bcaddy commented 1 year ago

Also, it might be better to store the containers in a "cholla" account rather than @alwinm's personal one long term. That's easy to change later though

alwinm commented 1 year ago

I can see some of the virtues of that approach, but I'm not sure they outweigh the labor cost.

100% agree we should have a cholla account one day.

bcaddy commented 1 year ago

I can see some of the virtues of that approach, but I'm not sure they outweigh the labor cost.

I agree, that's why I think we should automate it. I'm still looking but I think we can just have build-tests.yml check if the container exists and if not the build it. This should be done in a pre-req job so every single build type doesn't try to make a new container.

These links should do it:

So two jobs at the beginning to check that the two containers exist then the Build job can wait on them

Edit: the two container checking jobs should probably be one matrix job

evaneschneider commented 1 year ago

Given that this version compiles, passes all the build checks, and is significantly faster, and the impetus was to get it in ASAP so that @ojwg can move ahead with PR 199, I'd like to merge it as-is, and we can work out the details of how to implement the docker container with version reference later. @alwinm, do these docker containers have the necessary libraries for Orlando's new code?

alwinm commented 1 year ago

Yes, both containers contain the random number generation that Orlando's code requires.

A nice feature of the container is that I can pull Orlando's branch into it and try compiling so I can know ahead of time that it will work. I've tested TYPE=disk with both hip and cuda.