Test case using global snapshot

alejandrobodas commented 4 years ago

In the PMC of 29 May 2020, we agreed that it would be good to have a more robust set of cases for the test suite. This included adding more points than the standard ~150 points from the current test, such a global snapshot from the UM or any other model.

RobertPincus commented 4 years ago

This generalizes issue #35.

dustinswales commented 4 years ago

@alejandrobodas I created a branch (use_CESM2_snapshot) with a new driver that uses a snapshot of COSP inputs from CESM2. I haven't fully vetted the outputs, but on first glance everything seems to be working fine. There is a data size issue that needs to be addressed. The input file is ~50M and the output file is about 500M (4X for CI tests on 4 compilers). So 5 files of about 2G is needed, which is far too large for github. One option is we add these files to GitHub Large File Storage (lfs), https://git-lfs.github.com/. This replaces the files in the github repo with a pointer into another repo where the files are stored. This would make the input data and KGO files transparent to the user. Another option would be for us to host these files somewhere else where they can be accessed with openDap (I would have done this but ran into some admin issues hosting unsupported data...)

alejandrobodas commented 4 years ago

Hi @dustinswales The file size limit is 100M, isn't it? What's the horizontal and vertical grid of the input file? I was planning to create an input file from a low-res version of HadGEM3, N48 ~3500 gridpoints, and ~50 vertical levels. If I've done the calculations correctly, the output files will still be over the 100M limit, so I think we'll have to use one of the solutions you propose. It looks like Github LFS would be more transparent, both to the users and the developers. A couple of questions about the CESM branch: 1) Would it be possible to produce a low-res version? If you can't run the model at lower resolution, subsampling one in four points should still produce a snapshot good enough for our purposes. Even if we still need to use LFS, I think it would be good to try and minimise the data volumes and the extra time taken to run the tests. 2) Would it be possible to use the same driver for all the tests? The two drivers differ in a few places, but they mostly share the same code.

dustinswales commented 4 years ago

Hi @dustinswales The file size limit is 100M, isn't it? What's the horizontal and vertical grid of the input file? I was planning to create an input file from a low-res version of HadGEM3, N48 ~3500 gridpoints, and ~50 vertical levels. If I've done the calculations correctly, the output files will still be over the 100M limit, so I think we'll have to use one of the solutions you propose. It looks like Github LFS would be more transparent, both to the users and the developers. A couple of questions about the CESM branch:

Would it be possible to produce a low-res version? If you can't run the model at lower resolution, subsampling one in four points should still produce a snapshot good enough for our purposes. Even if we still need to use LFS, I think it would be good to try and minimise the data volumes and the extra time taken to run the tests.

For the CESM2 run I used the default grid distributed with the code, f09_g17, which is 144x92x32 (lon,lat,level). There are many other grid options supported we can use (http://www.cesm.ucar.edu/models/cesm2/cesm/grids.html). For the output file sizes quoted above I was using 20 subcolumns. This took about 1 minute to run using the intel compiler on a local machine. I didn't do any tuning/testing to speed this up (e.g. different chunk sizes, reducing subcolumns)

Would it be possible to use the same driver for all the tests? The two drivers differ in a few places, but they mostly share the same code.

It absolutely could (should be) all be put into one driver and controlled via namelist. There is one main difference, other than the input/output routines of course. CAM provides the 11micron emissivity and 0.67micron optical-depth for snow. This requires changes to cosp_optics.F90 for the passive simulators, and subsequently a different subsample_and_optics() routine. So the unified driver would call different input/output/subsample_and_optics routines when using the CESM2 snapshot versus UKMO.

alejandrobodas commented 4 years ago

Hi @dustinswales That sounds good. If it's not too difficult to setup a different grid, I would favour a lower resolution grid. After all, we are now working with a very small set of points, so any global model field will provide us with a huge increase in the amount of testing. If we use smaller fields, then I don't think the speed of the tests will be an issue. Also, we could choose not to run all the tests with all the compilers if the CI testing takes too long.

CFMIP / COSPv2.0

Test case using global snapshot #49