MPAS-Dev / MPAS-Tools

MPAS Tools Repository
Other
37 stars 63 forks source link

Add NetCDF C version of mesh conversion tools #514

Closed xylar closed 1 year ago

xylar commented 1 year ago

This merge adds a new version of the mesh conversion tools that uses the standard NetCDF C library, rather than the legacy NetCDF C++ library.

Most of the work was done by @dengwirda, so I can only take credit for the conda-package plumbing and some minor debugging and clean-up.

We have encountered significant problems with the legacy NetCDF C++ library. Foremost is that it does not perform well on large meshes (over several million cells). The library is not being developed further and even the successor NetCDF C++ library has not had a release in about 4 years and does not appear to be in active use by many projects.

I have chosen to leave the existing mesh creation tools unchanged for now. The new tools are in their own mesh_creation_tools_netcdf_c directory. We could eventually decide to remove mesh_creation_tools and adopt mesh_creation_tools_netcdf_c in their place but I have not done that so far.

@dengwirda has ported the mesh converter and cell culler but not the mask creator. This could be done by anyone interested but our group has other tools we use for mask creation that we use instead, so it is not a priority for us. This is one of the reasons for maintaining mesh_creation_tools as it is.

I have changed the conda package to use mesh_creation_tools_netcdf_c. Since the mask creator is not available, I have change the mpas_tools.mesh.conversion.mask() method to use the python masking capabilities that are part of the conda package.

Finally, this merge includes changes to how mpas_toolsmesh.creation.build_mesh() works: It now uses subprocess calls rather than the convert() and cull() wrapper functions, since the wrapper functions have caused out-of-memory errors for large meshes (and provide no added value in this situation). Note: if desired, I could break this change into its own PR.

xylar commented 1 year ago

@mgduda, I want to make sure you're okay with this addition. I realize it could be disruptive or cause confusion for your group to have mesh_conversion_tools and mesh_conversion_tools_netcdf_c. But @dengwirda and I certainly hope the tools will be useful to everyone, not just us.

Please give me your feedback as soon as you're able because we'd like to get this merged in and into use if you're okay with it.

xylar commented 1 year ago

@mark-petersen and @matthewhoffman, I'm making sure you're aware of this change, since it could affect MPAS-Ocean and MALI. (I'm not sure who to ping for MPAS-Seaice at this point. @darincomeau, is that you?)

xylar commented 1 year ago

@dengwirda, I'd like your feedback on whether the version of the code here looks good to you. I made only minor bug fixes, so it should basically be what you gave me.

Please also let me know if it works for you. You can install it yourself into a conda environment with:

conda install -c conda-forge/label/mpas_tools_dev mpas_tools=0.22.0rc1
xylar commented 1 year ago

Testing

I tested this with the compass pr test suite using the current compass main branch on Chrysalis. All test passed and were bit-for-bit with the version of MPAS-Tools currently used in compass (0.19.0), which is very exciting!

I also plan to test this on the RRS6to18 mesh in https://github.com/MPAS-Dev/compass/pull/576, which failed miserably with the existing mesh_conversion_tools after more than 2 days of execution.

xylar commented 1 year ago

@mgduda, @mark-petersen, @matthewhoffman and @dengwirda, just a reminder that I'm looking for feedback on this PR.

dengwirda commented 1 year ago

Thanks @xylar, I've had success on pm-cpu running the following:

conda create -n netcdf_c_tester -c conda-forge/label/mpas_tools_dev mpas_tools=0.22.0rc2
conda activate netcdf_c_tester
python3 test.py

where test.py is:

import time
import xarray

from mpas_tools.mesh.conversion import convert, cull
from mpas_tools.io import write_netcdf
from mpas_tools.mesh.creation import jigsaw_to_netcdf

if (__name__ == "__main__"):

    ttic = time.time()
    print("Forming mesh_triangles.nc")
    jigsaw_to_netcdf(
        msh_filename="mesh.msh",
        on_sphere=True,
        sphere_radius=6371220.,
        output_name="mesh_triangles.nc")
    ttoc = time.time()
    print("jgsw-to-mpas:", ttoc - ttic)

    ttic = time.time()
    print("Forming base_mesh.nc")
    write_netcdf(convert(
        xarray.open_dataset("mesh_triangles.nc"),
        graphInfoFileName="graph.info"),
        format="NETCDF4",
        fileName="base_mesh.nc")
    ttoc = time.time()
    print("mpas-convert:", ttoc - ttic)

With an 18M cell mesh the output is:

Forming mesh_triangles.nc
jgsw-to-mpas: 1331.344482421875
Forming base_mesh.nc
mpas-convert: 623.2475054264069

and I suspect the non-vectorised call to build the circumcentres is a place where a lot of the step 1 time is being spent.

xylar commented 1 year ago

@mgduda, @mark-petersen, @matthewhoffman, another request for you to have a look at this please.

xylar commented 1 year ago

Thanks @mark-petersen!

xylar commented 1 year ago

It is strange that the c version of the netcdf libraries are supported and the c++ are not - I would have expected the opposite.

There are several things to say about this. First, both the NetCDF Fortran and C++ libraries are based on (wrappers around?) the NetCDF C library. So the C library gets a lot more attention than the others. The C++ library hasn't been updated in years.

To add to the confusion, there was a major rewrite of the NetCDF C++ API awhile back, and our tools use the NetCDF C++ library from before the rewrite, so they are using the "legacy" NetCDF C++ library. That has made a bad situation worse.

The move to the NetCDF C library in @dengwirda's version of the tools means we're using the library that gets the most maintenance, so I think that's unquestionably the right direction to go.

xylar commented 1 year ago

@mgduda, I'm assuming you may be away. I am going to merge this on Monday unless I hear from you.

xylar commented 1 year ago

More testing

With my recent fixes to the cell culler, I am re-testing with the following meshes. A check mark means the mesh produced by compass is BFB the same as with 0.21.0 release.