Closed edhartnett closed 6 years ago
A question: is e.g. PIOc_read_nc_decomp intended to be called by the user or will it be wrapped by some existing netcdf-API function(s).
The read/write decomp functions read/write a special netCDF file that contains the decomposition information for PIO. So you can create a decomposition, and save it to file, and use it over and over.
These functions will be wrapped in functions to make them look more netCDF-like. (like nc_put_decomp()?)
Would this work? The decomposition information can be passed to create and open methods in the dispatch table using the "parameters" argument and then stored in the NC.dispatchdata field. Then when the vara/vars methods in the dispatch table are called for PIO, that code can extract the decomp info and do the proper thing with it. to invoke as needed PIOc_read/write_nc_decomp.
What I would like to do is come in and brief you and Ward on PIO and how it works. There are some details, but it is fitting into the netCDF API very well.
In terms of decompositions, it would not be unusual to have several in use at one time. The user initializes them in an init_decomp() call. The read/write decomp need never be called, they are just convenience functions to allow easy recording and communication of what decomposition was used. (Doesn't matter for the contents of the data, just how the data are distributed over a particular hardware configuration.)
PIO introduced two new objects to the netCDF data model:
The iosystemid is pretty easy to hide, as long as there is just one (and that is typical).
The decomposition IDs are more plentiful and must be directly controlled by the user. A variable may have a different read and write decomposition (to manage halo effects, for example).
Well, as you know, you can create an NC_PIO_INFO structure per-open/created-file and stash whatever you need into it, including decompositions and iosystemids and other pio specific objects.
When you say
The decomposition IDs are more plentiful and must be directly controlled by the user. A variable may have a different read and write decomposition ... what do you mean by directly controlled by the user? Given sufficient info at file open/create time can the proper decomp to use be automatically determined when later needed?
Unfortunately, the decomps cannot be automatically determined. They are how PIO is able to be tuned to the specific hardware for optimum performance.
I will get my base implementation together, and if we can come up with any way to better hide the decomps, I am open to it.
Sorry, I was not clear. I am not asking to automatically determine the decomps, but rather to ask if which decomp(s) to use can be determined from information such as the complete set of available decomps + specific variable to write plus slab information. In order words, given the usual argument to vara, can I determine which decomp to use?
It certainly would be possible to associate a default decomp for each var read and write. But note that the arguments to vara change anyway. Note that distributed arrays only work with a "record" (which is a looser definition of record than you are used to.)
So there is no need for the user to specify start/count. An entire record is read/written at one time. The processor only gets/puts the local portion of the global array, based on the decomposition.
I have taken down my PR with this feature because I now see a better way to implement this, one that will not require taking PIO code into the netCDF library codebase. Instead, it can be treated like pnetcdf, as an optional interface that can be turned on at configure, used with NC_PIO in the mode flag, but does not involve PIO code in netCDF. Just some wrappers, as with pnetcdf.
This requires some work on PIO first. When I get that done I will circle around again and put up a PIO PR.
I think this functionality may best be brought to users via a user-dispatch library, or via HPC NetCDF-C, or through some other mechanism. Interested users should contact me directly.
OK, this capability has been added to PIO, and it works great! ;-)
Introduction
This ticket described a plan to include the functions of the PIO library into the netCDF library, so that it may be (optionally) available to all netCDF HPC (High Performance Computing) users. The HPC community is a core component of the Unidata community, and better support of modern supercomputers will allow netCDF to better server these users.
The Parallel I/O library (https://github.com/) as developed at NCAR. It provides advances HPC I/O functionality for netCDF codes, with a netCDF API. The PIO library allows HPC users to make the most of their computational hardware, without waiting on I/O more than is absolutely essential.
The goal of this effort is to make the PIO functionality available to netCDF users, through the NetCDF API.
Main Features
Computational Components and I/O Component
In async mode, it is possible for the user to define several computational components, and one (shared) I/O component.
Each component can have an arbitrary number of processors.
In computational components, netCDF calls actually send data to the I/O component, which buffers data and handles disk I/O.
For example, a machine of 500K processors can assign 1K processors to I/O, 250K processors to an atmospheric model, 100K processors to an ocean model, 10K processors to a space weather model, etc., until all 500K processors are used up. Each of the models can do netCDF I/O as usual, but all actual I/O will be channeled through the I/O component. Data are buffered at all stages to improve performance.
Configuration and Build
The PIO functionality is only included if netCDF is built with --enable-pio. It makes no sense to use PIO on systems without multiple cores, but this is not enforced by the build, and tests will build and pass even on a single core.
The PIO functionality is available through the netCDF internal dispatch system. This allows the use of the netCDF API by various sublibraries (netCDF4, HDF4, pnetCDF, etc.) The PIO functionality is accessed in the same way.
The code for the integrated PIO functionality is in subdirectory libpio.
Testing of PIO
The pio_test directory contains tests for PIO. These tests will only be run of --enable-parallel-tests is specified.
Using PIO
In order to use PIO, the user must include the NC_PIO flag in the mode of nc_create/nc_open.
Additions to the NetCDF Data Model
PIO introduces two new objects to the netCDF data model:
New Functions for the API
Several new functions need to be added to the netCDF API to support PIO functionality. However, it is not necessary to add them to the netCDF dispatch table, since other netCDF sub-libraries will never need to implement these functions.
Initialization
There are two new PIO initialization functions, one for async, one for non-async (The names of these functions will be changed to match the netCDF API). A finalize call must be called at the end of all processing.
Decompositions
Decompositions define how a netCDF variable is distributed across processors on the supercomputer.
For convenience there are also two functions to write/read a netCDF file which records the decomposition information. Use of decomposition files is helpful for debugging, but not generally used in real processing.
Record Reads/Writes on Distributed Arrays
There are read/write functions to read/write a record of a distibuted array. (A record is defined by writing 1 element of the first dimension, and all elements of subsequent dimensions. In a classic netCDF file with an unlimited dimension, this is a record. But with PIO, the first dimension does not have to be unlimited.)
Schedule
I anticipate the first version of this integration, supporting netCDF classic files, will be ready before the end of 2017. If so, it can be included in the 4.5.1 release of netCDF.
Work Details
This work can be followed in detail on the GitHub repo: https://github.com/edhartnett/netcdf-c