JCSDA-internal / soca

Sea-ice Ocean Coupled Assimilation
Apache License 2.0
12 stars 4 forks source link

Generic diffusion operator #1038

Closed travissluka closed 2 months ago

travissluka commented 5 months ago

Description

Deletes the soca-specific diffusion saber central block and uses the generic diffusion operator added in saber. There are small differences in answers because 1) randomization is using different random numbers 2) The grid properties are slightly different. Previously I used dx, dy directly from MOM6, but with the generic diffusion the properties of the dual mesh are calculated by atlas, and are slightly different.

I've also added a python tool to convert the output of the saber::Diffusion calibration into the same grid used by SOCA.

By generalizing the code and moving to saber, there goes another 1700 lines of soca code :bomb: !

[!IMPORTANT] users (@guillaumevernieres, @Dooruk) will need to change their yaml file for the saber central block, see the changes in the testinput/3dvar.yaml in this PR. Also, unfortunately, the calibration will need to be redone, as the file format is now different. The precalculated horizontal and vertical diffusion parameter files will be available on the HPCs in $EWOK_STATIC_DATA/skylab-9.0.0/soca/ once the following issue is closed:

Dependencies

waiting for skylab to be updated

hga007 commented 5 months ago

@travissluka: Thank you! It compiles, but it fails when running all my Unit Tests. For example, in the Geometry test, I get:

test_romsjedi_geometry: /opt/src/atlas-0.35.1/src/atlas/array/SVector.h:122: const T &atlas::array::SVector<T>
::operator[](int) const [with T = int]: Assertion `data_ && idx < size_' failed.
[pontus:3486125] *** Process received signal ***
[pontus:3486125] Signal: Aborted (6)
[pontus:3486125] Signal code:  (-6)
[pontus:3486125] [ 0] /lib64/libc.so.6(+0x54db0)[0x15372b654db0]
[pontus:3486125] [ 1] /lib64/libc.so.6(+0xa154c)[0x15372b6a154c]
[pontus:3486125] [ 2] /lib64/libc.so.6(raise+0x16)[0x15372b654d06]
[pontus:3486125] [ 3] /lib64/libc.so.6(abort+0xd3)[0x15372b6287f3]
[pontus:3486125] [ 4] /lib64/libc.so.6(+0x2871b)[0x15372b62871b]
[pontus:3486125] [ 5] /lib64/libc.so.6(+0x4dca6)[0x15372b64dca6]
[pontus:3486125] [ 6] test_romsjedi_geometry: /opt/src/atlas-0.35.1/src/atlas/array/SVector.h:122: const T &at
las::array::SVector<T>::operator[](int) const [with T = int]: Assertion `data_ && idx < size_' failed.

All my tests passed when I ran feature/remove_geometry_latlon today. Is your branch feature/diffusion up to date with OOPS and the SABER develop branches?

I followed your example and my parameters_diffusion.yaml has

background error:
  covariance model: SABER
  saber central block:
    saber block name: DIFFUSION
    calibration:
      normalization:
        iterations: 1000                              # Ramdomization approach
      groups:
      - horizontal:
          fixed value: 160.0e3
        write:
          filepath: Data/diffusion
          files prefix: wc13_diffusion_hcor
      - vertical:
          fixes value: 150.0
          levels: 10
        write:
          filepath: Data/diffusion
          files prefix: wc13_diffusion_vcor

I need clarification on what the token levels means. In ROMS-JEDI, levelsAreTopDown()=false. Does this imply that the vertical correlation scale is the thickness of the top 10 vertical levels?

Also, how do you specify different correlation scales for each state variable? In our applications, for example, we have different scales for temperature and salinity.

travissluka commented 5 months ago

@hga007

hga007 commented 5 months ago

@travissluka: Thank you. I'll try again then. I suspected that was implied by the levels token. For us, 150 meters is the vertical scale for temperature, and 30m is the vertical scale for salinity. Anyway, for testing, I will use many levels for the vertical scale. This can be fine-tuned later. The critical issue here is to evaluate the generic pseudo-diffusion operation as a method to spread covariance influence.

We want to compare horizontal convolutions for SSH between DIFFUSION and BUMP_NICAS operators. There is an issue in BUMP_NICAS for SSH, but I haven't been able to check in the debugger because of the Fortran Phyton Preprocessor, which is unfortunate. We get assembly code that it is impossible to understand.

hga007 commented 5 months ago

@travissluka: I tried again but still got the same error in atlas::array::SVector shown above. Since it works for others, I wonder if I am missing an ATLAS update or ahead of the updates.

travissluka commented 5 months ago

@travissluka: I tried again but still got the same error in atlas::array::SVector shown above. Since it works for others, I wonder if I am missing an ATLAS update or ahead of the updates.

ah, I see, so nothing works even if you're not using DIFFUSION ? Try to comment out this line in oops that I added https://github.com/JCSDA-internal/oops/blob/0368ea9ade2ee88d704b9be214576cb46ec69196/src/oops/base/GeometryData.cc#L334 It's the only change I made that isn't part of the DIFFUSION block, let me know if that works and we'll go from there

hga007 commented 5 months ago

@travissluka: Yes, the Geometry object fails. Thus, none of the Unit Tests work. If I comment on the line in GeometryData.cc that you suggest, I go a little further, and I get the following error:

Test "interface/Geometry/testConstructor" failed with unhandled eckit::Exception: Trying to access field `xyz' in Nodes, but no field with this name is present in Nodes. @  (/opt/src/atlas-0.35.1/src/atlas/mesh/Nodes.cc +77 field)

I suspect the issue is that ROMS has a different grid specification in ATLAS than the one used in SOCA.

travissluka commented 5 months ago

@travissluka: Yes, the Geometry object fails. Thus, none of the Unit Tests work. If I comment on the line in GeometryData.cc that you suggest, I go a little further, and I get the following error:

Test "interface/Geometry/testConstructor" failed with unhandled eckit::Exception: Trying to access field `xyz' in Nodes, but no field with this name is present in Nodes. @  (/opt/src/atlas-0.35.1/src/atlas/mesh/Nodes.cc +77 field)

I suspect the issue is that ROMS has a different grid specification in ATLAS than the one used in SOCA.

The only thing I added to oops was the build_edges(mesh_); line, so if your geometry constructor test is failing with that line commented then it is something unrelated to these diffusion PRs. Are you sure everything was working for you with the develop branches?

Can we move this discussion elsewhere, since this issue is not related to this SOCA PR.

hga007 commented 5 months ago

@travissluka: Sure, we can move the discussion somewhere else. Everything worked when I tested the changes due to feature/remove_geomety_latlon. I will test the develop branch to ensure it works.

hga007 commented 5 months ago

@travissluka: I can confirm that all the ROMS-JEDI Unit Tests run successfully with the develop branch. The behavior I am getting is likely due to the changes in feature/diffusion in OOPS. I am using atlas/0.35.1 for ifort. Do I need to use a newer ATLAS version or a different OOPS and SABER branch?

byoung-joo commented 5 months ago

Hi @travissluka , As you can see in the JEDI integration test, most (or all?) mpasjedi ctest failed. For example of mpasjedi geometry unit test, https://cdash.jcsda.org/test/10860581 Does something need to be done in mpas-jedi for atlas::mesh::actions::build_edges(mesh_); (added in oops)?

maybe this should go to OOPS pr or jcsda discussion...