NOAA-GFDL / PyFV3

Python version of FV3 dynamical core
GNU General Public License v3.0
3 stars 6 forks source link

DISCLAIMER: Work in progress

FV3core

FV3core is a Python version, using GridTools GT4Py with CPU and GPU backend options, of the FV3 dynamical core (fv3gfs-fortran repo). The code here includes regression test data of computation units coming from serialized output from the Fortran model generated using the GridTools/serialbox framework.

As of January 10, 2021 this documentation is outdated in that it was written when we had fv3core as its own single repository. Some functionality, such as linting, has been moved to the top level but may still be described in this document as occuring inside the fv3core folder.

WARNING This repo is under active development and relies on code and data that is not publicly available at this point.

QuickStart

  1. Ensure you have docker installed and available for building and running and has access to the VCM cloud

Be sure to complete any required post-installation instructions (e.g. for linux). Also authorize Docker to pull from gcr. Your user will need to have read access to the us.gcr.io/vcm-ml repository.

  1. You can build the image, download the data, and run the tests using:
$ make tests savepoint_tests savepoint_tests_mpi

If you want to develop code, you should also install the linting requirements and git hooks locally

$ pip install -c constraints.txt -r requirements/requirements_lint.txt
$ pre-commit install

## Getting started, in more detail
If you want to build the main fv3core docker image, run

```shell
$ make build

If you want to download test data run

$ make get_test_data

And the c12_6ranks_standard data will download into the test_data directory.

If you do not have a GCP account, there is an option to download basic test data from a public FTP server and you can skip the GCP authentication step above. To download test data from the FTP server, use make USE_FTP=yes get_test_data instead and this will avoid fetching from a GCP storage bucket. You will need a valid in stallation of the lftp command.

MPI parallel tests (that run that way to exercise halo updates in the model) can also be run with:

$ make savepoint_tests_mpi

The environment image that the fv3core container uses is prebuilt and lives in the GCR. The above commands will by default pull this image before building the fv3core image and running the tests. To build the environment from scratch (including GT4py) before running tests, either run

make build_environment

or

$ PULL=False make savepoint_tests

which will execute the target build_environment for you before running the tests.

There are push_environment and rebuild_environment targets, but these should normally not be done manually. Updating the install image should only be done by Jenkins after the tests pass using a new environment.

Test data options

If you want to run different test data, discover the possible options with

$ make list_test_data_options

This will list the storage buckets in the cloud. Then to run one of them, set EXPERIMENT to the folder name of the data you'd like to use:

e.g.

$EXPERIMENT=c48_6ranks_standard make tests

If you choose an experiment with a different number of ranks than 6, also set NUM_RANKS=<num ranks>

Testing interactively outside the container

After make savepoint_tests has been run at least once (or you have data in test_data and the docker image fv3core exists because make build has been run), you can iterate on code changes using

$ DEV=y make savepoint_tests

or for the parallel or non-savepoint tests:

$ DEV=y make tests savepoint_tests_mpi

These will mount your current code into the fv3core container and run it rather than the code that was built when make build ran.

Running tests inside a container

If you to prefer to work interactively inside the fv3core container, get the test data and build the docker image (see above if you do not have a GCP account and want to get test data):

$ make get_test_data
$ make build

Testing can be run with this data from /port_dev inside the container:

$ make dev

Then in the container:

$ pytest -v -s --data_path=/test_data/ /port_dev/tests --which_modules=<stencil name>

The 'stencil name' can be determined from the associated Translate class. e.g. TranslateXPPM is a test class that translate data serialized from a run of the fortran model, and 'XPPM' is the name you can use with --which_modules.

Test options

All of the make endpoints involved running tests can be prefixed with the TEST_ARGS environment variable to set test options or pytest CLI args (see below) when running inside the container.

NOTE: FV3 is current assumed to be by default in a "development mode", where stencils are checked each time they execute for code changes (which can trigger regeneration). This process is somewhat expensive, so there is an option to put FV3 in a performance mode by telling it that stencils should not automatically be rebuilt:

$ export FV3_STENCIL_REBUILD_FLAG=False

Porting a new stencil

  1. Find the location in the fv3gfs-fortran repo code where the save-point is to be added, e.g. using
$ git grep <stencil_name> <checkout of fv3gfs-fortran>
  1. Create a translate class from the serialized save-point data to a call to the stencil or function that calls the relevant stencil(s).

These are usually named tests/savepoint/translate/translate_<lowercase name>

Import this class in the tests/savepoint/translate/__init__.py file

  1. Write a Python function wrapper that the translate function (created above) calls.

By convention, we name these fv3core/stencils/<lower case stencil name>.py

  1. Run the test, either with one name or a comma-separated list
$ make dev_tests TEST_ARGS="-–which_modules=<stencil name(s)>"

Please also review the Porting conventions section for additional explanation

Installation

Docker Image

To build the us.gcr.io/vcm-ml/fv3core image with required dependencies for running the Python code, run

$ make build

Add PULL=False to build from scratch without running docker pull:

PULL=False make build

Relevant repositories

Some of these are submodules. While tests can work without these, it may be necessary for development to have these as well. To add these to the local repository, run

$ git submodule update --init

The submodules include:

Dockerfiles and building

There are two main docker files:

  1. docker/dependencies.Dockerfile - defines dependency images such as for mpi, serialbox, and GT4py

  2. docker/Dockerfile - uses the dependencies to define the final fv3core images.

The dependencies are separated out into their own images to expedite rebuilding the docker image without having to rebuild dependencies, especially on CI.

For the commands below using make -C docker, you can alternatively run make from within the docker directory.

These dependencies can be updated, pushed, and pulled with make -C docker build_deps, make -C docker push_deps, and make -C docker pull_deps. The tag of the dependencies is based on the tag of the current build in the Makefile, which we will expand on below.

Building from scratch requires both a deps and build command, such as make -C docker pull_deps fv3core_image.

If any example fails for "pulled dependencies", it means the dependencies have never been built. You can build them and push them to GCR with:

$ make -C docker build_deps push_deps

Building examples

fv3core image with pulled dependencies:

$ make -C docker pull_deps fv3core_image

CUDA-enabled fv3core image with pulled dependencies:

$ CUDA=y make -C docker pull_deps fv3core_image

fv3core image with locally-built dependencies:

$ make -C docker build_deps fv3core_image

Updating Serialbox

If you need to install an updated version of Serialbox, you must first install cmake into the development environment. To install an updated version of Serialbox from within the container run

$ wget https://github.com/Kitware/CMake/releases/download/v3.17.3/cmake-3.17.3.tar.gz && \
  tar xzf cmake-3.17.3.tar.gz && \
  cd cmake-3.17.3 && \
  ./bootstrap && make -j4 && make install
$ git clone -b v2.6.1 --depth 1 https://github.com/GridTools/serialbox.git /tmp/serialbox
$ cd /tmp/serialbox
$ cmake -B build -S /tmp/serialbox -DSERIALBOX_USE_NETCDF=ON -DSERIALBOX_TESTING=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/serialbox
$ cmake --build build/ -j $(nproc) --target install
$ cd -
$ rm -rf build /tmp/serialbox

Pinned dependencies

Dependencies are pinned using constraints.txt. This is auto-generated by pip-compile from the pip-tools package, which reads requirements.txt and requirements/requirements_lint.txt, determines the latest versions of all dependencies (including recursive dependencies) compatible those files, and writes pinned versions for all dependencies. This can be updated using:

$ make constraints.txt

This file is committed to the repository, and gives more reproducible tests if an old commit of the repository is checked out in the future. The constraints are followed when creating the fv3core docker images. To ensure consistency this should ideally be run from inside a docker development environment, but you can also run it on your local system with an appropriate Python 3 environment.

Development

To develop fv3core, you need to install the linting requirements in requirements/requirements_lint.txt. To install the pinned versions, use:

$ pip install -c constraints.txt -r requirements/requirements_lint.txt

This adds pre-commit, which we use to lint and enforce style on the code. The first time you install pre-commit, install its git hooks using:

$ pre-commit install
pre-commit installed at .git/hooks/pre-commit

As a convenience, the lint target of the top-level makefile executes pre-commit run --all-files. Linting, which formats files and checks for some style conventions, is required, as the same checks are the first step in the continuous integration testing that happens when creating a pull request. Linting locally saves time and literal energy, since CI tests do not have to be launched so many times!

Please see the 'Development Guidelines' below for more information on the structure of the code to align your new code with the current conventions, as well as the CONTRIBUTING.md document for style guidelines.

GT4Py version

FV3Core does not actually use the GridTools/gt4py main, it instead uses a Vulcan Climate Modeling development branch. This is publically available version at VCM/gt4py.

Situation: There is a new stable feature in a gt4py PR, but it is not yet merged into the GridTools/gt4py main branch. branches.cfg lists these features. Steps:

  1. Add any new branches to branches.cfg
  2. Rebuild the develop branch, either: a. make_develop gt4py-dev path/to/branches.cfg (you may have to resolve conflicts...) b. Adding new commits on top of the existing develop branch (e.g. merge or cherry-pick)
  3. Force push to the develop branch: git push -f upstream develop

The last step will launch Jenkins tests. If these pass:

  1. Create a git tag: git tag v-$(git rev-parse --short HEAD)
  2. Push the tag: git push upstream --tags
  3. Make a PR to VCM/gt4py that updates the version in docker/Makefile to the new tag.

License

FV3Core is provided under the terms of the GPLv3 license.

Development guidelines

File structure / conventions

The main functionality of the FV3 dynamical core, which has been ported from the Fortran version in the fv3gfs-fortran repo, is defined using GT4py stencils and python 'compute' functions in fv3core/stencils. The core is comprised of units of calculations defined for regression testing. These were initially generally separated into distinct files in fv3core/stencils with corresponding files in tests/savepoint/translate/translate_.py defining the translation of variables from Fortran to Python. Exceptions exist in cases where topical and logical grouping allowed for code reuse. As refactors optimize the model, these units may be merged to occupy the same files and even methods/stencils, but the units should still be tested separately, unless determined to be redundant.

The core has most of its calculations happening in GT4py stencils, but there are still several instances of operations happening in Python directly, which will need to be replaced with GT4py code for optimal performance.

The namelist and grid are global variables defined in fv3core/_config.py The namelist is 'flattened' so that the grouping name of the option is not required to access the data (we may want to change this).

The grid variables are mostly 2d variables and are 'global' to the model thread per mpi rank. The grid object also contains domain and layout information relevant to the current rank being operated on.

Utility functions in fv3core/utils/ include:

The tests/ directory currently includes a framework for translating fields serialized (using Serialbox from GridTools) from a Fortran run into gt4py storages that can be inputs to fv3core unit computations, and compares the results of the ported code to serialized data following a unit computation.

The docker/ directory provides Dockerfiles for building a repeatable environment in which to run the core

The external/ directory is for submoduled repos that provide essential functionality

The build system uses Makefiles following the convention of other repos within VulcanClimateModeling.

Model Interface

The top level functions fv_dynamics and fv_sugridz can currenty only be run in parallel using mpi with a minimum of 6 ranks (there are a few other units that also require this, e.g. whenever there is a halo update involved in a unit). These are the interface to the rest of the model and currently have different conventions than the rest of the model.

Porting conventions

Generation of regression data occurs in the fv3gfs-fortran repo (https://github.com/VulcanClimateModeling/fv3gfs-fortran) with serialization statements and a build procedure defined in tests/serialized_test_data_generation. The version of data this repo currently tests against is defined in FORTRAN_SERIALIZED_DATA_VERSION in this repo's docker/Makefile.image_names. Fields serialized are defined in Fortran code with serialization comment statements such as:

    !$ser savepoint C_SW-In
    !$ser data delpcd=delpc delpd=delp ptcd=ptc

where the name being assigned is the name the fv3core uses to identify the variable in the test code. When this name is not equal to the name of the variable, this was usually done to avoid conflicts with other parts of the code where the same name is used to reference a differently sized field.

The majority of the logic for translating from data serialized from Fortran to something that can be used by Python, and the comparison of the results, is encompassed by the main Translate class in the tests/savepoint/translate/translate.py file. Any units not involving a halo update can be run using this framework, while those that need to be run in parallel can look to the ParallelTranslate class as the parent class in tests/savepoint/translate/parallel_translate.py. These parent classes provide generally useful operations for translating serialized data between Fortran and Python specifications, and for applying regression tests.

A new unit test can be defined as a new child class of one of these, with a naming convention of Translate<Savepoint Name> where Savepoint Name is the name used in the serialization statements in the Fortran code, without the -In and -Out part of the name. A translate class can usually be minimally specify the input and output fields. Then, in cases where the parent compute function is insuffient to handle the complexity of either the data translation or the compute function, the appropriate methods can be overridden.

For Translate objects

      self.in_vars["data_vars"]["cx"] = {"istart": self.is\_, "iend": self.ie + 1,
                                         "jstart": self.jsd, "jend": self.jed,}

For ParallelTranslate objects:

Debugging Tests

Pytest can be configured to give you a pdb session when a test fails. To route this properly through docker, you can run:

TEST_ARGS="-v -s --pdb" RUN_FLAGS="--rm -it" make tests

This can be done with any pytest target, such as make savepoint_tests and make savepoint_tests_mpi.

GEOS API

The GeosDycoreWrapper class provides an API to run the dynamical core in a Python component of a GEOS model run. A GeosDycoreWrapper object is initialized with a namelist, communicator, and backend, which creates the communicators, partitioners, dycore state, and dycore object required to run the Pace dycore. A wrapper object takes numpy arrays of u, v, w, delz, pt, delp, q, ps, pe, pk, peln, pkz, phis, q_con, omga, ua, va, uc, vc, mfxd, mfyd, cxd, cyd, and diss_estd and returns a dictionary containing numpy arrays of those same variables. Wrapper objects contain a timer attrubite that tracks the amount of time moving input data to the dycore state, running the dynamical core, and retrieving the data from the state.