NCAR / ParallelIO

A high-level Parallel I/O Library for structured grid applications
Apache License 2.0
134 stars 52 forks source link

Backward compatibility for pio_openfile #1985

Closed anton-seaice closed 5 months ago

anton-seaice commented 6 months ago

Hi

We are updating our model (CICE) to use support netcdf4, and found an issue with backward compatibility of pio_openfile. When opening a netcdf-classic (cdf) file, with iotype set to PIO_iotype_netcdf4p, the openfile fails. Our netcdf build does not include parallel-netcdf, so the expected behaviour is that the PIO library will retry a serial read of the netcdf classic file after the parallel read fails.

This is the error message: _Abort with message NetCDF: Attempt to use feature that was not turned on when netCDF was built. in file /scratch/tm70/as2285/tmp/spack-stage/spack-stage-parallelio-2.5.10-zz25cmdlouvcwggv7zbkdmeobvz37aja/spack-src/src/clib/piocsupport.c at line 2832

This is the logging from PE 0, the first section looks correct. As a parallel read of netcdf classic file is not supported, and error code -128 is returned.

ERROR: 0 set loglevel to 3
    0 PIOc_openfile iosysid 65536 *iotype 4 filename /g/data/ik11/inputs/CICE_data/ic/gx3/iced_gx3_v5.nc mode 0
        0 pio_get_iosystem_from_id iosysid = 65536
        0 PIOc_openfile_retry iosysid = 65536 iotype = 4 filename = /g/data/ik11/inputs/CICE_data/ic/gx3/iced_gx3_v5.nc mode = 0 retry = 1
        0 retry error code ierr = -128 io_rank 0

On the retry, iotype has not been updated. Iotype here is 4 (NETCDF4) but it should be 2 (NETCDF)

        0 retry nc_open(/g/data/ik11/inputs/CICE_data/ic/gx3/iced_gx3_v5.nc) : fd = -1, iotype = 4, do_io = 1, ierr = -128
        0 Bcasting error code ierr -128 ios->ioroot 0 ios->my_comm 21622800
        0 Bcast openfile_retry error code ierr = -128
    0 check_netcdf2 status = -128 fname = /scratch/tm70/as2285/tmp/spack-stage/spack-stage-parallelio-2.5.10-zz25cmdlouvcwggv7zbkdmeobvz37aja/spack-src/src/clib/pioc_support.c line = 2832
        0 check_netcdf2 chose error handler = -51
    0 PIOc_strerror pioerr = -128

Expected result is the file opens using PE 0 only.

Here is my test code:


program pio_file_open_example
  use pio

  implicit none

  include 'mpif.h'

  integer, parameter :: nprocs = 4
  integer :: rank, ierr, file_handle, status, len 
  character(len=MPI_MAX_ERROR_STRING) :: error_string
  character(len=500) :: filename 
  type(iosystem_desc_t) :: pio_subsystem
  type(File_desc_t) :: File

  filename = '/g/data/ik11/inputs/CICE_data/ic/gx3/iced_gx3_v5.nc'

  ! Initialize MPI
  call MPI_Init(ierr)
  call MPI_Comm_rank(MPI_COMM_WORLD, rank, ierr)

  ! init

  call pio_init(0, MPI_COMM_WORLD, nprocs, 0, 1, &
  PIO_rearr_subset, pio_subsystem)

  ierr = pio_set_log_level(3)

  ! open

  status = pio_openfile(pio_subsystem, File, PIO_iotype_netcdf4p, trim(filename), pio_nowrite)

  ! finalize

  call PIO_finalize(pio_subsystem, status)

end program pio_file_open_example

And the test file:

$ncdump -k '/g/data/ik11/inputs/CICE_data/ic/gx3/iced_gx3_v5.nc'
classic
jedwards4b commented 6 months ago

Thank you for opening an issue and providing a test case. I can confirm that this is also happening in the latest pio2.6.2 and I will provide a fix as soon as I can. If you already have a fix and would like to submit a PR it would be gratefully accepted.

jedwards4b commented 6 months ago

@anton-seaice I'm not sure that I understand what is going on. In working with the test case that you provided I found that the cpp macro _NETCDF4 was not defined in the config.h in my build, so I rebuilt and saw that it is now defined and the test works fine. I double checked and I haven't made any changes in the pio source. Can you confirm that _NETCDF4 is defined in your config.h file?

jedwards4b commented 6 months ago

To follow up, the actual failure I had yesterday is because I was setting PIO_TYPENAME=netcdf4p and PIO_NETCDF_DATA_FORMAT=64bit-data which are incompatible options. I will add something to check that and fail with a reasonable error message.

anton-seaice commented 6 months ago

To follow up, the actual failure I had yesterday is because I was setting PIO_TYPENAME=netcdf4p and PIO_NETCDF_DATA_FORMAT=64bit-data which are incompatible options. I will add something to check that and fail with a reasonable error message.

I'm not sure you 'have' to do anything about that, the netcdf library returns a 'NetCDF: Invalid argument' if error handling is set to 'PIO_RETURN_ERROR'

anton-seaice commented 6 months ago

@anton-seaice I'm not sure that I understand what is going on. In working with the test case that you provided I found that the cpp macro _NETCDF4 was not defined in the config.h in my build, so I rebuilt and saw that it is now defined and the test works fine. I double checked and I haven't made any changes in the pio source. Can you confirm that _NETCDF4 is defined in your config.h file?

I can't see how to check that directly, but opening netcdf4 files works fine (and in parallel) and it looks like the right flag is set in cmake.

anton-seaice commented 6 months ago

Hi Jim

Apologies - you have already fixed this!

I updated to version 2.6.2 and my test code passes now.

See

https://github.com/NCAR/ParallelIO/commit/e437a94e72ed5cbc7f01fb0841e05f9be7f3bd1e#diff-205cd9c480611213ad871801509790fd76b6068519d6ead92e2aeb7321d82974

adding the ierr== NC_ENOTBUILT is probably what did it