Prefetch in StructureDataIterators to improve performance

TLDR: Reading remote CF DSG datasets via OPeNDAP and CDM Remote (and probably DAP4) is terribly slow.

This issue was originally raised in a netcdf-java mailing list message. Example dataset is a contiguous ragged array representation of profiles:

netcdf NERSC_ARC_PHYS_OBS_XBT_2012_v1 {
dimensions:
    obs = 11365 ;
    profile = 7 ;
variables:
    int profile(profile) ;
        profile:cf_role = "profile_id" ;
    double time(profile) ;
        time:standard_name = "time" ;
        time:long_name = "time" ;
        time:units = "days since 2011-09-11 15:30:00" ;
    float lon(profile) ;
        lon:standard_name = "longitude" ;
        lon:long_name = "longitude" ;
        lon:units = "degrees_east" ;
    float lat(profile) ;
        lat:standard_name = "latitude" ;
        lat:long_name = "latitude" ;
        lat:units = "degrees_north" ;
    int rowSize(profile) ;
        rowSize:long_name = "number of obs for this profile" ;
        rowSize:sample_dimension = "obs" ;
    float z(obs) ;
        z:standard_name = "depth" ;
        z:long_name = "Depth below sea level" ;
        z:units = "m" ;
        z:positive = "down" ;
        z:axis = "Z" ;
    float temperature(obs) ;
        temperature:standard_name = "sea_water_temperature" ;
        temperature:long_name = "Sea water temperature measured by XBT" ;
        temperature:units = "Celsius" ;
        temperature:coordinates = "time lon lat z" ;
    float pressure(obs) ;
        pressure:standard_name = "sea_water_pressure" ;
        pressure:long_name = "Sea water pressure" ;
        pressure:units = "dbar" ;
        pressure:coordinates = "time lon lat z" ;
    float svel(obs) ;
        svel:standard_name = "speed_of_sound_in_sea_water" ;
        svel:long_name = "Sound velocity" ;
        svel:units = "m s-1" ;
        svel:coordinates = "time lon lat z" ;

// global attributes:
        :featureType = "profile" ;
        :Conventions = "CF-1.6,ACDD-1.3,GCMD-DIF,NMDC-netCDF-0.1" ;
        // ...
}

I popped it on a local THREDDS server and read the first 5 observations using both OPeNDAP and CDM Remote. I used the code:

package ucar.nc2.ft.point.remote

import spock.lang.Specification
import ucar.nc2.constants.FeatureType
import ucar.nc2.ft.*

class PointFeatureDatasetSpec extends Specification {
    def "goo.gl/2OQ0t7"() {  // From netcdf-java mailing list
        setup: "feature dataset"
        String location =
            "http://localhost:8080/thredds/dodsC/local-tds-data/support/2OQ0t7/NERSC_ARC_PHYS_OBS_XBT_2012_v1.nc"
//            "http://localhost:8080/thredds/cdmremote/local-tds-data/support/2OQ0t7/NERSC_ARC_PHYS_OBS_XBT_2012_v1.nc"

        Formatter errLog = new Formatter()
        FeatureDatasetPoint fdPoint = (FeatureDatasetPoint) FeatureDatasetFactoryManager.open(
                FeatureType.ANY_POINT, location, null, errLog)

        expect:
        for (DsgFeatureCollection dsgFeatCol : fdPoint.pointFeatureCollectionList) {
            ProfileFeatureCollection profileCol = (ProfileFeatureCollection) dsgFeatCol

            for (ProfileFeature profile : profileCol) {
                PointFeatureIterator pointFeatIter = profile.pointFeatureIterator

                for (int i = 0; i < 5; ++i) {
                    printf "\nObs %s:\n", i + 1
                    pointFeatIter.hasNext()
                    pointFeatIter.next()
                }

                break;
            }

            break;
        }

        cleanup:
        fdPoint.close()
        errLog.close()
    }
}

Results for OPeNDAP, with ucar.nc2.dods.DODSNetcdfFile.debugServerCall = true

DConnect to = <http://localhost:8080/thredds/dodsC/local-tds-data/support/2OQ0t7/NERSC_ARC_PHYS_OBS_XBT_2012_v1.nc>
DODSNetcdfFile readDDS
DODSNetcdfFile readDAS
DODSNetcdfFile.readDataDDSfromServer = <?profile,time,lon,lat,rowSize>

Obs 1:
DODSNetcdfFile.readDataDDSfromServer = <temperature[0:1:0]>
DODSNetcdfFile.readDataDDSfromServer = <pressure[0:1:0]>
DODSNetcdfFile.readDataDDSfromServer = <svel[0:1:0]>
DODSNetcdfFile.readDataDDSfromServer = <z[0:1:11364]>

Obs 2:
DODSNetcdfFile.readDataDDSfromServer = <temperature[1:1:1]>
DODSNetcdfFile.readDataDDSfromServer = <pressure[1:1:1]>
DODSNetcdfFile.readDataDDSfromServer = <svel[1:1:1]>

Obs 3:
DODSNetcdfFile.readDataDDSfromServer = <temperature[2:1:2]>
DODSNetcdfFile.readDataDDSfromServer = <pressure[2:1:2]>
DODSNetcdfFile.readDataDDSfromServer = <svel[2:1:2]>

Obs 4:
DODSNetcdfFile.readDataDDSfromServer = <temperature[3:1:3]>
DODSNetcdfFile.readDataDDSfromServer = <pressure[3:1:3]>
DODSNetcdfFile.readDataDDSfromServer = <svel[3:1:3]>

Obs 5:
DODSNetcdfFile.readDataDDSfromServer = <temperature[4:1:4]>
DODSNetcdfFile.readDataDDSfromServer = <pressure[4:1:4]>
DODSNetcdfFile.readDataDDSfromServer = <svel[4:1:4]>

Results for CDM Remote, with ucar.nc2.stream.CdmRemote.showRequest = true, are similar.

So, to read the entire example file as a FeatureDatasetPoint, we'd need to make roughly 11365 * 3 = 34095 HTTP requests! And the file is only 182 KB! As you can imagine, that's very slow. It took about ~25 seconds when reading from my local server, but upwards of an hour when reading from an external server.

Our point stack exhibits a structure-oriented data access pattern, meaning that one value from each data variable is needed for each observation that it returns. The problem is that we're making an HTTP request for each value because the data is not truly laid out as a collection of structures (i.e. records). Instead, we're cheating by organizing the variables into a pseudo-structure: a collection of variables which all have the same outer dimension (it's not a real Structure because the values are not stored contiguously). All CF DSG layouts require this pseudo-structure interpretation, which means they'll all have the same poor read performance by our point stack.

To improve performance, we'd like to make fewer requests by grabbing more than one datum each time and caching the unneeded data that we get back for subsequent calls. But how? We can't just naively cache the entire variable. What if it's huge? Incidentally, we already do this for 1D coordinate axes, regardless of their size. I'm surprised we haven't been bitten by that yet.

Perhaps we can cache only some of the data? For example, if the user requests temperature[0:1:0], we actually grab temperature[0:1:100] and cache the excess, hoping that the user will read the other values within that range next. Obviously, effectiveness depends on access pattern, but this might be a reasonable strategy, especially for point data.

@JohnLCaron Any thoughts on this?

Im > The problem is that we're making an HTTP request for each value

Through opendap? Ive given up on opendap for this reason, and cdmremote was the attempt to solve it.

To improve performance, we'd like to make fewer requests by grabbing more than one datum each time

The whole point of DSG is to use iterators instead of direct access, allowing us to efficiently cache.

On Tue, Nov 1, 2016 at 2:37 AM, Christian W notifications@github.com wrote:

Our point stack exhibits a structure-oriented data access pattern, meaning that one value from each data variable is needed for each observation that it returns. The problem is that we're making an HTTP request for each value because the data is not truly laid out as a collection of structures (i.e. records). Instead, we're cheating by organizing the variables into a pseudo-structure: a collection of variables which all have the same outer dimension (it's not a real Structure because the variables are not stored contiguously). All CF DSG layouts require this pseudo-structure interpretation, which means they'll all have the same poor read performance by our point stack.

To improve performance, we'd like to make fewer requests by grabbing more than one datum each time and caching the unneeded data that we get back for subsequent calls. But how? We can't just naively cache the entire variable. What if it's huge? Incidentally, we already do this for 1D coordinate axes, regardless of their size. I'm surprised we haven't been bitten by that yet.

Perhaps we can cache only some of the data? For example, if the user requests temperature[0:1:0], we actually grab temperature[0:1:100] and cache the excess, hoping that the user will read the other values within that range next. Obviously, effectiveness depends on access pattern, but this might be a reasonable strategy, especially for point data.

@JohnLCaron https://github.com/JohnLCaron Any thoughts on this?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Unidata/thredds/issues/671#issuecomment-257514217, or mute the thread https://github.com/notifications/unsubscribe-auth/AAlcwKMt-EChuhi1XM73QppBXEADKkwVks5q5vo0gaJpZM4Kl2bS .

So to fill in the blanks, by having the iterator model for access, we can prefetch however we want when making the request, without the user ever knowing about it.

Side note: it's interesting that this problem seems to correspond exactly to the way overallocation of vectors/list allow one to avoid O(N^2) behavior when appending. (I realize we're missing the copy that causes the N^2, but still interesting.)

The problem is that we're making an HTTP request for each value

Through opendap? Ive given up on opendap for this reason, and cdmremote was the attempt to solve it.

CDM Remote exhibits the same behavior:

 CdmRemote request http://localhost:8080/thredds/cdmremote/local-tds-data/support/2OQ0t7/NERSC_ARC_PHYS_OBS_XBT_2012_v1.nc?req=header
 CdmRemote request http://localhost:8080/thredds/cdmremote/local-tds-data/support/2OQ0t7/NERSC_ARC_PHYS_OBS_XBT_2012_v1.nc?req=header took 50 msecs 
Obs 1:
CdmRemote data request for variable: 'temperature' section=(0:0)
CdmRemote data request for variable: 'pressure' section=(0:0)
CdmRemote data request for variable: 'svel' section=(0:0)
CdmRemote data request for variable: 'z' section=(0:11364)
Obs 2:
CdmRemote data request for variable: 'temperature' section=(1:1)
CdmRemote data request for variable: 'pressure' section=(1:1)
CdmRemote data request for variable: 'svel' section=(1:1)
Obs 3:
CdmRemote data request for variable: 'temperature' section=(2:2)
CdmRemote data request for variable: 'pressure' section=(2:2)
CdmRemote data request for variable: 'svel' section=(2:2)
Obs 4:
CdmRemote data request for variable: 'temperature' section=(3:3)
CdmRemote data request for variable: 'pressure' section=(3:3)
CdmRemote data request for variable: 'svel' section=(3:3)
Obs 5:
CdmRemote data request for variable: 'temperature' section=(4:4)
CdmRemote data request for variable: 'pressure' section=(4:4)
CdmRemote data request for variable: 'svel' section=(4:4)

I've narrowed the performance problem to StructureDataIteratorLinked.next(). Here we read a single StructureData (record) at a time, which causes the HTTP requests for single values. Instead, we could be prefetching with a call like:

ArrayStructure records = s.readStructure(currRecno, count);

So yeah, I definitely like the idea of prefetching in the iterators rather than prefetching in the Variables, since the iterators have a known access pattern. I'll change the title of this issue.

hmmm, i have forgotten the details, so i will have to review them at some point.

perhaps its the "cdm feature" API that is supposed to solve the problem by caching in the iterators.

On Tue, Nov 1, 2016 at 2:35 PM, Christian W notifications@github.com wrote:

The problem is that we're making an HTTP request for each value

Through opendap? Ive given up on opendap for this reason, and cdmremote was the attempt to solve it.

CDM Remote exhibits the same behavior:

CdmRemote request http://localhost:8080/thredds/cdmremote/local-tds-data/support/2OQ0t7/NERSC_ARC_PHYS_OBS_XBT_2012_v1.nc?req=header CdmRemote request http://localhost:8080/thredds/cdmremote/local-tds-data/support/2OQ0t7/NERSC_ARC_PHYS_OBS_XBT_2012_v1.nc?req=header took 50 msecs Obs 1: CdmRemote data request for variable: 'temperature' section=(0:0) CdmRemote data request for variable: 'pressure' section=(0:0) CdmRemote data request for variable: 'svel' section=(0:0) CdmRemote data request for variable: 'z' section=(0:11364) Obs 2: CdmRemote data request for variable: 'temperature' section=(1:1) CdmRemote data request for variable: 'pressure' section=(1:1) CdmRemote data request for variable: 'svel' section=(1:1) Obs 3: CdmRemote data request for variable: 'temperature' section=(2:2) CdmRemote data request for variable: 'pressure' section=(2:2) CdmRemote data request for variable: 'svel' section=(2:2) Obs 4: CdmRemote data request for variable: 'temperature' section=(3:3) CdmRemote data request for variable: 'pressure' section=(3:3) CdmRemote data request for variable: 'svel' section=(3:3) Obs 5: CdmRemote data request for variable: 'temperature' section=(4:4) CdmRemote data request for variable: 'pressure' section=(4:4) CdmRemote data request for variable: 'svel' section=(4:4)

I've narrowed the performance problem to StructureDataIteratorLinked. next() https://github.com/Unidata/thredds/blob/5.0.0/cdm/src/main/java/ucar/nc2/ft/point/StructureDataIteratorLinked.java#L70. Here we read a single StructureData (record) at a time, which causes the HTTP requests for single values. Instead, we could be prefetching with a call like:

ArrayStructure records = s.readStructure(currRecno, count);

So yeah, I definitely like the idea of prefetching in the iterators rather than prefetching in the Variables, since the iterators have a known access pattern. I'll change the title of this issue.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Unidata/thredds/issues/671#issuecomment-257687279, or mute the thread https://github.com/notifications/unsubscribe-auth/AAlcwP5KhOBc23oNNSETtAyv08-J1Qbhks5q56KEgaJpZM4Kl2bS .

It turns out that we already have this prefetch capability: StructureDataIterator.setBufferSize(int bytes). However, the only subclass that actually implements it is Structure.IteratorRank1. A better solution would be to move that functionality to a wrapper, and decorate other StructureDataIterators with it.

Something else I noticed: 3fec809ad6c638fcfb8c13889534e1ee4f8ec53c removes bufferSize from PointFeatureIterators (and many of the underlying StructureDataIterators). Was that a mistake if we want to enable prefetching? Should I revert it?

Also, for many PointFeatureCollections, there wasn't an easy way to change the bufferSize of the underlying PointFeatureIterator, even before the above commit. That needs to change.

Man, at this point, if the (Jenkins) tests pass on 5.0 with a revert and the additions to the StructureDataIteratorss , I'd say go for it and we'll deal with the repercussions, if any 👍

Is this needed for 4.6.x?

4.6 could certainly use it; my comments today were the result of an issue I'm working on with Yuan in the IDV relating to slow reading of PointFeatures. However, the point stack has changed so drastically in 5.0 that I'd have to implement 2 completely different fixes. I'm not gonna do that, so this'll be 5.0-only.

Unidata / thredds

Prefetch in StructureDataIterators to improve performance #671