GEOS-ESM / MAPL

MAPL is a foundation layer of the GEOS architecture, whose original purpose is to supplement the Earth System Modeling Framework (ESMF)
https://geos-esm.github.io/MAPL/
Apache License 2.0
26 stars 18 forks source link

call to ESMF_ConfigSetAttribute overwrites the next line in config (ESMF_Config) #2385

Open metdyn opened 9 months ago

metdyn commented 9 months ago

I met with an unexpected outcome when calling ESMF_ConfigSetAttribute to modify an entry in config. The code below with the input shows it overwrites the next line in the config.

 MAPL_EpochSwathMod.F90
        function create_grid()
          188 │ │ call ESMF_ConfigSetAttribute( config_grid, trim(time_string), label=trim(key)//'.Epoch_init:', _RC)

config:

  SwathGrid.Epoch_init:  '20121113T000000'
  SwathGrid.nc_Time:       'time'
  SwathGrid.nc_Longitude:   'cell_across_swath'
  call ESMF_ConfigFindLabel(config, 'SwathGrid.nc_Time:', isPresent=ispresent, rc=status)
  print*, 'SwathGrid.nc_Time: ispresent=', ispresent

shows: SwathGrid.nc_Time: ispresent= F

Moving SwathGrid.Epoch_init: '20121113T000000' to the end of the segment solves the problem. I am documenting this as a future reference.

mathomp4 commented 9 months ago

Hmm. Maybe something we want to bring to ESMF's attention?

metdyn commented 9 months ago

Hi @darianboggs,

I find this bug when in ExtDataDriver.x using nag and gcc.
However I suspect this can be reproduced using ESMF only. (I suspect it is directly related to ESMF)

For example, given a three-line config file: A: string1 B: string2 C: string3

We read the config into memory from ESMF modify the entry A: string1 to another value by calling ESMF_ConfigSetAttribute( config, trim(string), label='A', _RC)

Then we check if the nextline in the config (B: string2) is still there. I am not sure what you will get.
Thank you for your help.

darianboggs commented 8 months ago

DO NOT MERGE bugfix/wdboggs/#2385-overwrite_ESMF_ConfigSetAttribute.

Branch bugfix/wdboggs/#2385-overwrite_ESMF_ConfigSetAttribute implements the standalone app, OverwriteConfig.x, to test this bug.

OverwriteConfig.x recreates the error with the Intel compiler. @mathomp4 or @tclune may want to try compiling with NAG. make install will create the standalone recreator from Apps/OverwriteConfig.F90. OverwriteConfig.x requires no arguments, but you can supply a temp file name as the first argument. The temp file will be deleted on completion.

DO NOT MERGE bugfix/wdboggs/#2385-overwrite_ESMF_ConfigSetAttribute.

The branch not been tested for any other purpose.

mathomp4 commented 8 months ago

@darianboggs So how did you call your code and what should we look for when running it (aka, what is the "wrong" behavior)?

mathomp4 commented 8 months ago

Well, with GNU on Alderaan I get:

❯ ./OverwriteConfig.x
 temporary file: overwrite_20231017135938.710_rc.tmp
 Error closing overwrite_20231017135938.710_rc.tmp. Manually delete if necessary.
Result - Failure: test_overwrite failed: :: ESMF_ConfigCreate successful :: write_attributes successful :: ESMF_ConfigLoadFile successful

Is there a "verbose" way to see what's happening? Lots of successfuls, but result is a Failure...

darianboggs commented 8 months ago

I pushed up simpler tests, and SimpleOverwriteConfig.x is now passing, finding the second attribute label. I am looking at the difference between SimpleOverwriteConfig.F90 and OverwriteConfig_FullTest.F90 to see why the simpler version passes, but the original version fails like Yonggang's code.

tclune commented 8 months ago

Often one must start with the large failing case and whittle tiny bits and verify it still fails. And if your not careful, it is a good idea to save intermediates so that if you trip you don't have to start over.

darianboggs commented 8 months ago

After stripping down the test to the basic test setup necessary to recreate Yonggang's context:

ESMF_Initialize
ESMF_ConfigCreate
ESMF_ConfigLoadFromFile

and then calling ESMF_ConfigSetAttribute for the first attribute, the test found the second attribute by calling ESMF_ConfigFindLabel. This does not recreate the problem that Yonggang found.

Yonggang ran my basic test on his Mac, compiling with NAG. There are a few ideas I have:

  1. Yonggang customized the end of line character in MAPL_Config.F90. He thinks that it could be causing problems.
  2. Yonggang is using MAPL_ConfigCreate, while my test uses ESMF_ConfigCreate.
  3. My test uses ESMF_ConfigSetAttribute to change the first attribute value. We normally use MAPL_ConfigSetAttribute to set attributes with MAPL.
  4. Finally, the ESMF_Config instance is passed into the procedure where the problem occurs. While the values originate from a config file, they may have already been changed before the procedure receives the instance.
metdyn commented 8 months ago

@darianboggs, Thanks. I agree with most of what you find, except 4, which I have not seen before. I didnot know about MAPL_ConfigSetAttribute before. Maybe if I replace ESMF_ConfigSetAttribute with MAPL_ConfigSetAttribute, this problem will not occur.

tclune commented 8 months ago

At this point we are not trying to find a workaround. We want to demonstrate a clear error so that it can be repaired. So if changing from ESMF set to MAPL set appears to solve the problem, please use the former to create a reproducer. Again, at this point, even if it involves an entire build/run of GEOS that is acceptable. We can whittle it down from there.

stale[bot] commented 6 months ago

This issue has been automatically marked as stale because it has not had recent activity. If there are no updates within 7 days, it will be closed. You can add the "long term" tag to prevent the Stale bot from closing this issue.

mathomp4 commented 6 months ago

I'll comment to keep open. I think this is in ESMF's hands now...right?

metdyn commented 6 months ago

Thank you for keeping this thread alive, @mathomp4. That is right.

stale[bot] commented 4 months ago

This issue has been automatically marked as stale because it has not had recent activity. If there are no updates within 7 days, it will be closed. You can add the "long term" tag to prevent the Stale bot from closing this issue.

mathomp4 commented 4 months ago

I'm not sure the status of this. I want to say ESMF got a reproducer? But I'm not sure.