Consider adding striding logic to F-TDS analysis operations

karlmsmith commented 6 years ago

Reported by @noaaroland on 13 Dec 2010 23:18 UTC The best bet for getting better performance out of long time series plots that involve analysis in XY might be to sub-sample in XY. (There was some discussion of have the plotting script stride in T as determined by looking at attributes which indicate the "size and cost" of the analysis do be done in XY to create the virtual variable, but this was deemed to not provide much of a savings relative to the diminished value of the resulting plot).

The best place to put the striding might be in F-TDS.

When requested by LAS, the the striding can be requested (via a "property" that is in the script embedded in the URL) so the striding can be done automatically. Such a plot must be explicitly labeled to show how the striding was done in XY.

Users that wish the same automatic striding in URL they generate themselves will have to ask for it by including the property. This will help avoid confusion and give users confidence they know what they're getting.

Migrated-From: http://dunkel.pmel.noaa.gov/trac/las/ticket/1000

karlmsmith commented 6 years ago

Comment by @noaaroland on 13 Dec 2010 23:20 UTC See related #974.

karlmsmith commented 6 years ago

Comment by @AnsleyManke on 13 Dec 2010 23:29 UTC adding ansley to the CC list.

karlmsmith commented 6 years ago

Comment by @AnsleyManke on 14 Dec 2010 00:50 UTC Steve listed several issues we'll need to investigate and decide how to handle.

If F-TDS imposed automatic subsampling, what are the issues raised?

* how much to subsample?  By default (say) limit to not more than a 10K factor (100x100 XY average) due to transformation?  (29 Meg)
* need to test whether for local files this actually speeds things up much, as netCDF subsampling IO along non-record axes still requires touching a lot of data.
* need to communicate the subsampling statistics back to the LA product script  (through attributes) and faithfully annotate this on the products

karlmsmith commented 6 years ago

Comment by @AnsleyManke on 14 Dec 2010 01:24 UTC Here's a pretty-big local dataset, about 4.1Gb.

/home/porter/data/ansley/las_data/big_air.nc

This is based on the long coads AIRT set, monthly time axis from 1854-1999, regridded onto a finer XY grid, so that its dimensions are i=1:1081,j=1:540,L=1:1750.

I did some tests using Ferret from the command line on this dataset, to compare the effects of striding in T, in XY, and in XYT. The test was to LOAD an area-averaged time series, and I used striding by a factor of 10 in all dimensions.

The upshot is:

No striding   clock-time =  30 seconds
Stride XY     clock-time =  20 seconds
Stride  T     clock-time =   2 seconds
Stride XYT    clock-time = 0.3 seconds

Presumably this carries over to OPeNDAP requests and says we get most of the benefit from striding in the T direction. However the improvement from striding just in T to striding in XYT is significant. The resulting time series that's computed look quite similar. (The T striding is much more noticeable on this monthly data than the XY striding inside the area average.)

I ran each test run in a fresh Ferret session, and on stout where there isn't so much activity as on porter. The scripts look like this one for the XY striding.

! Area-average time series, test of striding in XY
use "/home/porter/data/ansley/las_data/big_air.nc"
set mem/siz=300

set axis/stride=10 `air_fine,return=xaxis`
set axis/stride=10 `air_fine,return=yaxis`

define symbol clock_1 = ($clock_secs)
load air_fine[i=1:108@ave,j=1:54@ave,L=1:1750]
define symbol clock_2 = ($clock_secs)

SAY Clock time for XY striding: `($clock_2) - ($clock_1)`

karlmsmith commented 6 years ago

Comment by @AnsleyManke on 14 Dec 2010 19:33 UTC Steve points out that knowing whether the disk blocks are cached by Unix makes a huge difference in timing IO operations. If the first test had no striding, it would leave all of the data in cache (assuming that the Unix cache is big enough ... probably yes.) If the unstrided command were tested first, then all tests are from Unix disk cache.

A sure way to test the true IO speed: Copy the file to another name.

Here are some results, each from its own fresh copy of the file. They look to show that T striding at full XY resolution is consistently FASTER than XYT striding.

Clock time for no striding: 118.5
Clock time for no striding: 116.124
Clock time for no striding: 114.429

Clock time for XY striding:  66.361
Clock time for XY striding: 140.754
Clock time for XY striding:  70.504
Clock time for XY striding:  62.379

Clock time for T striding:  4.645
Clock time for T striding:  4.515
Clock time for T striding: 54.54
Clock time for T striding:  4.518

Clock time for XYT striding: 8.074
Clock time for XYT striding: 5.912
Clock time for XYT striding: 7.034
Clock time for XYT striding: 7.626

karlmsmith commented 6 years ago

Modified by @noaaroland on 7 Jan 2011 01:17 UTC

NOAA-PMEL / LAS

Consider adding striding logic to F-TDS analysis operations #1006