[FEATURE REQUEST] CI with build matrix

LiamBindle commented 5 years ago

Re: https://github.com/geoschem/gchp/issues/36

LiamBindle commented 5 years ago

ESMF 8 was released yesterday, and it's spack build worked for me, so now we can add ESMF 8 to our build matrix images. This means we can set up the Azure pipeline for gchp_ctm anytime now.

When do should set up the pipeline for gchp_ctm? Also, what should set the triggers to (e.g. release candidate tags, commits to the master branch, weekly, biweekly, etc.)?

From the Azure docs:

Each organization starts out with the free tier of Microsoft-hosted CI/CD. This tier provides the ability to run one parallel build or release job, for up to 30 hours per month. If you need to run more than 30 hours per month, or you need to run more than one job at a time, you can switch to paid Microsoft-hosted CI/CD.

So if we stick with Azure, we get 30 hours per month of run time (for the GEOS-Chem organization). The build matrix images (repo, build pipeline, dockerhub...the name is a placeholder) include GCHP's dependencies so our gchp_ctm pipeline just needs to compile GCHP.

Personally, I like the idea of triggering the pipeline on release candidate tags, because that gives us manual control over when the tests actually run.

lizziel commented 5 years ago

How many library combos are we starting with and how long would the sum of all builds be for that combo? Ideally the test would be for every commit to the primary dev branch, and also master. Master will eventually only be updated upon version release, same as GEOS-Chem, but the primary dev would need testing along the way. If there are a lot of commits, and we eventually add runs to the tests, I can see surpassing 30 hrs per month with this setup. But we should collect some numbers on this.

LiamBindle commented 5 years ago

I timed GEOS-Chem Classic's build (dev/12.6.0 with CMake) and GCHP's build (gchp_ctm) and here is what I found:

Which	Build's CPU time	Builds per 30 hours	Notes about what I timed
GC-Classic	00:04:26	~360	dev/12.6.0's CMake build
GCHP	00:22:57	~60	gchp_ctm's build

How many library combos are we starting with and how long would the sum of all builds be for that combo?

For GCHP I was thinking something like this:

Line name	Target	OS	Compiler	MPI	NetCDF
default	general	Ubunutu 16.04	GCC 7	OpenMPI 3	>=4.2
CentOS	OS	CentOS 7	GCC 7	MPICH	>=4.2
GCC 8	Compiler	Ubunutu 16.04	GCC 8	OpenMPI 3	>=4.2
GCC 9	Compiler	Ubunutu 16.04	GCC 9	OpenMPI 3	>=4.2
Intel 18	Compiler	Ubunutu 16.04	Intel 2018	OpenMPI 3	>=4.2
Intel 19	Compiler	Ubunutu 16.04	Intel 2019	OpenMPI 3	>=4.2
OpenMPI 4	MPI	Ubunutu 16.04	GCC 7	OpenMPI 4	>=4.2
MVAPICH2	MPI	Ubunutu 16.04	GCC 7	MVAPICH2	>=4.2
MPICH	MPI	Ubunutu 16.04	GCC 7	MPICH	>=4.2
Intel MPI	MPI	Ubunutu 16.04	Intel 2018	Intel MPI	>=4.2
Old NetCDF	NetCDF	Ubunutu 16.04	GCC 7	MPICH	4.1

That would be a total of 11 lines which would take ~5.5 CPU hours per build matrix test. Any thoughts? Initially we could start with just a couple lines and add more over time.

For GC-Classic I was thinking something like:

Line name	Target	OS	Compiler	NetCDF
default	general	Ubunutu 16.04	GCC 7	>=4.2
CentOS	OS	CentOS 7	GCC 7	>=4.2
GCC 5	Compiler	Ubunutu 16.04	GCC 5	>=4.2
GCC 6	Compiler	Ubunutu 16.04	GCC 6	>=4.2
GCC 8	Compiler	Ubunutu 16.04	GCC 8	>=4.2
GCC 9	Compiler	Ubunutu 16.04	GCC 9	>=4.2
Intel 17	Compiler	Ubunutu 16.04	Intel 17	>=4.2
Intel 18	Compiler	Ubunutu 16.04	Intel 18	>=4.2
Intel 19	Compiler	Ubunutu 16.04	Intel 19	>=4.2
Old NetCDF	NetCDF	Ubunutu 16.04	GCC 7	4.1

That would be a total of 10 lines which would take ~50 CPU minutes per build matrix test. Any thoughts? Again, initially we could start with just a few lines.

Ideally the test would be for every commit to the primary dev branch, and also master.

I think we could do this by having two pipelines. One pipeline that's triggered on each commit to master and dev/* and builds with the default line. And a second pipeline that run the entire build matrix for each tagged release candidate.

If there are a lot of commits, and we eventually add runs to the tests, I can see surpassing 30 hrs per month with this setup. But we should collect some numbers on this.

I think so too...here's a quick estimate of how much time the CI tests would have taken over the last year if this was implemented as described above.

Repo	Trigger	Estimated CPU Time	Notes
GC-Classic	Commits	~2.2 hours/month	316 commits to master in the last year
GC-Classic	RCs	~0.7 hours/month	10 releases, assuming 5 RCs per release
GCHP	Commits	~13.8 hours/month	333 commits to master in the last year
GCHP	RCs	~22.9 hours/month	10 releases, assuming 5 RCs per release

That puts us at ~39.6 hours per month (or ~1/20th of a core year). The bulk of that comes from GCHP. One thing we could do is look at setting up our own self-hosted Azure agent .

lizziel commented 5 years ago

This is an excellent analysis, thanks! My concern about reserving the full build matrix for tagged release candidates only is that if there are issues then the version is already released. Is it possible to do manual triggers of the full suite of tests, such as shortly before a merge to master? Otherwise I agree that a reduced set of tests could be applied every commit for dev/*.

To be clear, if we go above the max # of hours/month will the tests simply not run? Also, is the number of hours for a given month readily available such that further tests could be temporarily suspended if need be, e.g. if we have an unusual high number of commits.

LiamBindle commented 5 years ago

My concern about reserving the full build matrix for tagged release candidates only is that if there are issues then the version is already released. Is it possible to do manual triggers of the full suite of tests, such as shortly before a merge to master?

I see what you mean. Has there been any talk about adopting pre-release alpha, beta, and rc stage tags? I haven't really paid attention to their specific meanings in the past, but I just read up on it and I think they would be useful for communicating the stage of a dev/* branch.

From here:

Alpha The alpha phase of the release life cycle is the first phase to begin software testing (alpha is the first letter of the Greek alphabet, used as the number 1). ... Beta Beta, named after the second letter of the Greek alphabet, is the software development phase following alpha. ... Beta phase generally begins when the software is feature complete but likely to contain a number of known or unknown bugs. ... Release candidate A release candidate (RC), also known as "going silver", is a beta version with potential to be a final product, which is ready to release unless significant bugs emerge.

What if alpha tags were for build testing (i.e. we trigger the build matrix on X.Y.Z-alpha* tags), beta tags were for versions that get benchmarked, and then RC tags are benchmarks that get sent to the GCSC for approval? A scheme like this would communicate to the community where in the development lifecycle a dev/X.Y.Z branch is. The alpha tags would also give us fine-grain control over when we want to run our more involved CI tests (build matrix for now, but maybe timestepping tests in the future?).

To be clear, if we go above the max # of hours/month will the tests simply not run?

I think that's right. I think the build will show up as "canceled".

Also, is the number of hours for a given month readily available such that further tests could be temporarily suspended if need be, e.g. if we have an unusual high number of commits.

If you go to "Project Settings > Parallel jobs" it tells you how many minutes you've consumed. I wouldn't say it's readily available, but it's there.

@JiaweiZhuang, @yantosca, @msulprizio: Sorry, the scope of my comments in this thread have drifted outside the scope of just gchp_ctm. I figured this might be of interest to you as well.

JiaweiZhuang commented 5 years ago

I think the 30-hour limit is only for private projects? Public & open-source projects should have unlimited build time, from Azure docs:

Public project: 10 free Microsoft-hosted parallel jobs that can run for up to 360 minutes (6 hours) each time, with no overall time limit per month. Private project: One free parallel job that can run for up to 60 minutes each time, until you've used 1,800 minutes (30 hours) per month.

JiaweiZhuang commented 5 years ago

For GCHP I was thinking something like this:

Line name Target OS Compiler MPI NetCDF default general Ubunutu 16.04 GCC 7 OpenMPI 3 >=4.2 CentOS OS CentOS 7 GCC 7 MPICH >=4.2 GCC 8 Compiler Ubunutu 16.04 GCC 8 OpenMPI 3 >=4.2 GCC 9 Compiler Ubunutu 16.04 GCC 9 OpenMPI 3 >=4.2 Intel 18 Compiler Ubunutu 16.04 Intel 2018 OpenMPI 3 >=4.2 Intel 19 Compiler Ubunutu 16.04 Intel 2019 OpenMPI 3 >=4.2

It would be difficult to set up Intel compilers on CI due to licensing issues (travis-ci/travis-ci#4604), although it seems doable if you really want to (e.g. https://github.com/nemequ/icc-travis). Intel MKL is fine though, since it's free and relatively easy to install (https://github.com/travis-ci/travis-ci/issues/5381#issuecomment-281983826)

My suggestion is to only use GNU compilers, and test more MPI variants. You can use Intel MPI + GNU compiler.

LiamBindle commented 5 years ago

I think the 30-hour limit is only for private projects? Public & open-source projects should have unlimited build time...

Oh yeah, I think you're right! Well, that simplifies things. I was looking at this but it must be talking about private projects. I just checked my penelope project and it's at 0/1800 minutes so you must be right.

It would be difficult to set up Intel compilers on CI due to licensing issues (travis-ci/travis-ci#4604), although it seems doable if you really want to (e.g. https://github.com/nemequ/icc-travis). Intel MKL is fine though, since it's free and relatively easy to install (travis-ci/travis-ci#5381 (comment))

My suggestion is to use only use GNU compilers, and test more MPI variants. You can use Intel MPI + GNU compiler.

+1 from me

Most of the build-matrix images are built/almost ready (here), with the major exception being those for Intel compilers. I was going to put those off as long as I could, because I have no clue they would work. I'm all for skipping them

edit: fixed private link

JiaweiZhuang commented 5 years ago

Most of the build-matrix images are built/almost ready (here)

This link doesn't seem to be public: https://cloud.docker.com/repository/registry-1.docker.io/liambindle/penelope/tags

lizziel commented 5 years ago

I don't want to give up on Intel compilers just yet. Ifort is the preferred compiler as long as GCC causes a performance hit.

JiaweiZhuang commented 5 years ago

I don't want to give up on Intel compilers just yet.

Users are free to use ifort if they have access to. As for CI, gfortran seems a higher bar. Do we have ifort-only issues that do not happen for gfortran?

lizziel commented 5 years ago

I have run into at least one compiler error that was caught by ifort and not gfortran.

geoschem / GCHP

[FEATURE REQUEST] CI with build matrix #1