NOAA-EMC / NCEPLIBS-ncio

This is a NOAA library used by NCEP GSI system to read the GFS forecast files for use in data assimilation.
Other
1 stars 6 forks source link

segmentation fault call write_vardata when running enkf_chgres_recenter_nc on Hera #86

Open MingjingTong-NOAA opened 3 months ago

MingjingTong-NOAA commented 3 months ago

When I ran enkf_chgres_recenter_nc on Hera, I encountered a segmentation fault when enkf_chgres_recenter_nc writes out the interpolated model fields by calling write_vardata (see /scratch2/NCEPDEV/stmp1/Mingjing.Tong/gfs_cycled/logs/2022060918/enkfgdasechgres.log.0). In this run 'atmens_fcst' is linked to /scratch2/NCEPDEV/stmp1/Mingjing.Tong/gfs_cycled/enkfgdas.20220609/18/mem001/model_data/atmos/history/enkfgdas.t18z.atmf006.nc_shield. In enkf_chgres_recenter_nc, the output file is first created via call create_dataset(output_file, indset, nocompress=.true.), where output is 'atmens_fcst', and the variable data in output_file are overwritten with the interpolated model variables.

I didn't have this problem before Hera was transitioned to Rock8. I'm using the enkf_chgres_recenter_nc from the latest develop branch of https://github.com/NOAA-EMC/gfs-utils.

When I replace the data 'atmens_fcst' with that from GFS (/scratch2/NCEPDEV/stmp1/Mingjing.Tong/gfs_cycled/enkfgdas.20220609/18/mem001/model_data/atmos/history/enkfgdas.t18z.atmf006.nc), the problem is gone (see /scratch2/NCEPDEV/stmp1/Mingjing.Tong/gfs_cycled/logs/2022060918/enkfgdasechgres.log).

I don't know what't the problem with enkfgdas.t18z.atmf006.nc_shield. It has less vertical levels than GFS. I tried turn off compression or turn off quantize, while writing out the enkfgdas.t18z.atmf006.nc_shield, but they didn't work. I also have no problem visualize that file.

edwardhartnett commented 3 months ago

Can you give me a copy of the data file that caused the problem?

MingjingTong-NOAA commented 3 months ago

The data that caused the problem is /scratch2/GFDL/gfdlscr/Mingjing.Tong/scrub/test/enkfgdas.20220609/18/atmos/mem001/gdas.t18z.atmf006.nc on Hera. Thanks!

MingjingTong-NOAA commented 3 months ago

@edwardhartnett Do you have anything for me to test?

edwardhartnett commented 3 months ago

Not yet...

MingjingTong-NOAA commented 3 months ago

@edwardhartnett I found a bug in write_vardata_code.f90 that causes errors when model fields are written out using write_vardata in enkf_chgres_recenter_nc and a couple of other programs using the ncio library. The problem appears when the time dimension of the output file is defined to be unlimited, which is the case for our model and GFSv16 model output. The problem is caused by the 'varshape' variable, which is used when the time dimension of the output file is unlimited.

45 else if (n == nd .and. dset%dimensions(ndim)%isunlimited) then 46 start(n) = ncount 47 varshape = shape(values) 48 count(n) = varshape(n)

The dimension of 'varshape' equals the dimension of the model fields defined in the output file (see the following line), which are spatial plus time dimensions 32 allocate(varshape(dset%variables(nvar)%ndims))

However, the dimension of 'values' passed into the subroutine does not include the time dimension in enkf_chgres_recenter_nc and other programs, which only process one time level data (the time dimension equals 1). So if 'varshape' is not initialized, the last dimension of varshape (varshape(n)) is undefined and a wrong number will be given to count(n), which caused the segmentation fault.

I made the following change, which fixed the problem.

[Mingjing.Tong@hfe07 src]$ git diff write_vardata_code.f90 diff --git a/src/write_vardata_code.f90 b/src/write_vardata_code.f90 index f55e73e..b8de5f8 100644 --- a/src/write_vardata_code.f90 +++ b/src/write_vardata_code.f90 @@ -30,6 +30,7 @@ nvar = get_nvar(dset,varname) allocate(start(dset%variables(nvar)%ndims),count(dset%variables(nvar)%ndims)) allocate(varshape(dset%variables(nvar)%ndims))

edwardhartnett commented 3 months ago

Terrific. I will put that up as a PR...