GEOS-ESM / GEOSgcm

GEOS Earth System Model GEOSgcm Fixture
Apache License 2.0
35 stars 13 forks source link

Issue with remap_restarts.py, the catchment restart, and merra2 in older tags #728

Open bena-nasa opened 9 months ago

bena-nasa commented 9 months ago

Note I'm putting this here, bucause it's not 100% clear which constituent repo has the problem without a little more investigation

A user using v11.2.0 of the fixture, using remap_restarts.py reported a problem which I was able to replicate. He had regridded restarts from merra2. He was using said restarts for an experiment and the model was failing when it got to the catchment restart. I dig some digging, and tried to regrid my own set of restarts from merra2 with this version of the fixture and there is a problem. The merra2 restarts are binary and each variable gets written to a record of the Fortran binary. Using this code /home/atrayano/bin/tell_rec you can see the number of records in a file. The merra2 restarts in archive have 57 records in the catchment restarts So one would expect an output file with probably 57 records. But when I regrid from merra2 to say c90, 360x180 data Ocean, using the "newland" here is what my catchment restart has:

catch_internal_rst.20180101_21z.nc4: Record 1: 294720034 words

one record, not 57! Something has just gone wrong.

Now it looks like remap_restart.py was changed and it now converts the restart to netcdf before regridding. I tried v11.4.0 of the fixture which is now producing netcdf restarts even from merra2, and the catchment restart you get is fine. The model is happy, it has the right number of variables.

So something is wrong with regridding the catchment restart when it is binary.

The question, is it worth doing anything, or just say if you want merra2, get a recent enough version of the model that you get the netcdf restarts even for merra2?

I leave it for others, just reporting what I see as if this user hit this, maybe others will

biljanaorescanin commented 9 months ago

@weiyuan-jiang or @gmao-rreichle can chime in but I think we wanted to abandon binary catch version as far as support goes. People can still use regrid.pl for those older model restarts.

bena-nasa commented 9 months ago

@biljanaorescanin we do want to abandon binary, couldn't agree more. However, users are still using older tags with this bug. The simplest solution is to just tell users to use a new enough model version to regrid restarts from merra2 (The netcdf restarts from merra2 inputs came in in v11.3.3 of the GEOSgcm fixture) Even if this were fixed, it's not like Scott can change an older release tag anyway so not sure what can be realistically done. In newer tags this is just not an issue since the binary path is no longer exercised

I made this more for reference in case other people hit this so we are aware.

I don't think that there's any real way to fix this that would get into a tag that would matter. Any newer tag just would not exercise the binary code path, so its a non-issue in those.

Or if regrid.pl still works, but if the bug is in the actually catchment regridding code might still be there since they are running the same executables underneath to do the work.

gmao-rreichle commented 9 months ago

We can confirm that starting with GEOSgcm v11.3.3, the associated remap_restarts.py package (from GEOS_Util v2.0.4) should work.
The remap_restarts.py package is buggy for older versions of GEOSgcm (v11.2.0). Users are advised to use GEOSgcm v11.3.3 or later when using remap_restarts.py.