HydrologicEngineeringCenter / Vortex

data processing utilities
MIT License
25 stars 7 forks source link

Importer skipping timesteps #69

Open openSourcerer9000 opened 2 years ago

openSourcerer9000 commented 2 years ago

So I'm having this issue where Vortex Importer is skipping timesteps. When using multiple STAGE IV radar GRIB files, it happens randomly, where sometimes it will skip a timestep or two, and sometimes it will write them all correctly, depending on its mood.

The workaround for that problem is to simply rerun it multiple times until it fills out the DSS. However, I'm also getting an issue where it will consistently skip over certain timesteps, when writing from a NetCDF gridset. This happens both when running from Jython and from the GUI. _EDIT: the consistent skips were fixed by duplicating the time coord var to an aux coor var called valid_time, as described below_

The NetCDF's are CF-compliant, and have no missing timesteps. I've opened them up and plotted the timesteps that are not going through Vortex properly, and everything looks fine.

To reproduce

Attached below is one NetCDF and one clipping shapefile. InputData.zip

Run the importer with the following options. I'm using the latest release but the bug was also happening with an older version before upgrading. image image

The following timesteps are the ones consistently dropping out. The same happens using other clip shapefiles as well. I'm also getting this surprising behavior where the single hourly timestamp is converted to a start time (D part) 0.5 hr earlier and an end time (E part) 0.5 hr later, so I'm having to timeshift the resulting DSS. I suppose that was the designed behavior, it just seems odd to me.

[Timestamp('2020-10-16 17:30:00'),
 Timestamp('2020-10-16 18:30:00'),
 Timestamp('2020-10-20 15:30:00'),
 Timestamp('2020-10-20 16:30:00'),
 Timestamp('2020-10-21 23:30:00'),
 Timestamp('2020-10-22 00:30:00'),
 Timestamp('2020-10-23 03:30:00'),
 Timestamp('2020-10-23 04:30:00'),
 Timestamp('2020-10-23 18:30:00'),
 Timestamp('2020-10-23 19:30:00')]
danhamill commented 2 years ago

I was also able to reproduce this issue. There should be 745 records, but vortex only imported 741:

08:58:45.499      -----DSS---zclose  Handle 3;  Process: 38776;  File: C:\Temp\vortex\InputData\test.dss
08:58:45.501                         Number records:         741
08:58:45.501                         File size:              183484  64-bit words
08:58:45.502                         File size:              1433 Kb;  1 Mb
08:58:45.502                         Dead space:             0
08:58:45.503                         Hash range:             8192
08:58:45.504                         Number hash used:       533
08:58:45.505                         Max paths for hash:     3
08:58:45.505                         Corresponding hash:     40
08:58:45.506                         Number non unique hash: 0
08:58:45.507                         Number bins used:       533
08:58:45.507                         Number overflow bins:   0
08:58:45.507                         Number physical reads:  962
08:58:45.508                         Number physical writes: 5060
08:58:45.508                         Number denied locks:    0

And the partD/partE in the timestamp before one of the missing grids is also incorrect.

image

tombrauer commented 2 years ago

I'm looking at Delta2020.nc in Panoply Viewer and the time axis seems to have some discontinuities: image

openSourcerer9000 commented 2 years ago

Mmmm so valid_time is important? I was doing everything by coord var time, including interpolating some gaps, which seemed to have dropped the values of the auxiliary coords for those timesteps. I'll fill out valid_time and hopefully that clears up the consistent timestep drops.

I think I'll add legit_time_yo and @therealtimeauxiliarycoordvar as well just to cover all the bases.

openSourcerer9000 commented 2 years ago

Yes, duplicated the time coordinate to valid_time cleared that up. It still randomly skips timesteps, so I will go ahead and leave this open, but I am at least able to work around that by rerunning importer until it fills out the DSS.

danhamill commented 2 years ago

@openSourcerer9000 are you generating these netcdf files your self?

danhamill commented 2 years ago

Once idea I had to help define the NETCDF structure required to vortex to properly import to DSS is to develop an additional utility that will convert a dss file to NETCDF format. This would provide a straight forward approach to users developing their own NETCDF files. Might be outside the scope of Vortex, but I personally find the CF conventions hard to follow since many datasets only partially conform to the CF convention.

Having the ability to convert NETCDF-to-DSS and from dss-to-NETCDF will provide users more options to modify the gridded data using more robust NETCDF packages (e.g, xarray).

tombrauer commented 2 years ago

I designed Vortex with this in mind. You'll notice that the option to specify pathname parts only become available after a file is selected. So, if you were to select something like a *.nc file for output, you may see options related to that file type. To add NetCDF write capability to Vortex a NetcdfDataWriter implementation would need to be added that extends DataWriter.

tombrauer commented 2 years ago

@openSourcerer9000 I can't reproduce the timestep skipping issue. Do you have a dataset that reliably reproduces?