Unidata / thredds

THREDDS Data Server v4.6
https://www.unidata.ucar.edu/software/tds/v4.6/index.html
265 stars 179 forks source link

TDS does not aggregate "incomplete" CF DSG TimeSeries #1095

Open pacioos opened 6 years ago

pacioos commented 6 years ago

Hi TDS folks, One of the recommended formats for a CF Discrete Sampling Geometries (DSG) TimeSeries is to have two dimensions: one for the various stations in the file (e.g., "timeseries" or "station") and another dimension to hold the data for each station (e.g., "obs"). However, TDS does not aggregate this type of dataset using the "incomplete" (non-orthogonal multidimensional array) format; i.e., with variable "time(timeseries, obs)". Instead, given a series of NetCDF files for "joinExisting" NcML aggregation over the "obs" dimName, it only provides data for the time steps included in the last (most recent) file.

      <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">           
        <aggregation dimName="obs" type="joinExisting" recheckEvery="1 minute">
          <scan location="/my/directory/path/" suffix=".nc" />
        </aggregation>
      </netcdf>

This "incomplete" format is outlined in the CF spec and NODC/NCEI "gold standard" (@mbiddle-nodc) templates and the ".ncCFMA" output from ERDDAP (@BobSimons) (example header). The same files and directory structure imports and aggregates successfully into ERDDAP. But the TDS aggregation only serves the latest file, as described above.

Please advise if there are further materials I can supply to help diagnose this issue. I'm currently running the latest stable version of TDS (4.6.11) and have verified the same problem exists in the previous version of TDS (4.6.10). The two datasets that we are serving that have this problem in our TDS are located in the following catalogs, listed below. Compared to the datasetScan versions (file access, second url), these aggregations only include data from the most recent NetCDF file:

PacIOOS Water Quality Buoy 04 (WQB-04): Hilo Bay, Big Island, Hawaii http://oos.soest.hawaii.edu/thredds/idd/wqb.html?dataset=WQB04agg http://oos.soest.hawaii.edu/thredds/catalog/hioos/wqb/wqb04/catalog.html

PacIOOS Water Quality Buoy 05 (WQB-05): Pelekane Bay, Big Island, Hawaii http://oos.soest.hawaii.edu/thredds/idd/wqb.html?dataset=WQB05agg http://oos.soest.hawaii.edu/thredds/idd/wqb.html?dataset=WQB05agg

Thanks for your help, John Maurer Data System Engineer Pacific Islands Ocean Observing System (PacIOOS) University of Hawaii at Manoa

cofinoa commented 6 years ago

@pacioos only the most outer dimesion of variables can be aggregated using joinExisting aggregations.

In your case, the timeseries dimension, it's the most outer but you are aggegation in the obs which is inner.

P.S.:

Because the timeseries has a size of 1, you can logical reduce that dimension from your variables and then make the aggregation.

Based in your example, I have made folliwing template that you should check and complete.

<?xml version="1.0" encoding="UTF-8"?>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
  <!-- Restore the timeseries dimension-->
  <aggregation type="joinNew" dimName="timeseries">
    <!-- Enumerate all the variables to be restored -->
    <variableAgg name="time"/>
    <variableAgg name="temperature"/>
    <netcdf>
      <aggregation type="joinExisting" dimName="obs">
        <!-- Enumerate all the variables to be reduced and aggregated on obs dimension-->
        <variable name="time">
          <logicalReduce dimNames="timeseries"/>
        </variable>
        <variable name="temperature">
          <logicalReduce dimNames="timeseries"/>
        </variable>
        <!-- The previous definitions will be applied to all files/netcdf before aggregation -->
        <scan location="/my/directory/path/" suffix=".nc" />
      </aggregation>
    </netcdf>
  </aggregation>
</netcdf>

It requires some knowledge from the internal data structure, but it should work for your case.

pacioos commented 6 years ago

Thanks for your reply, @cofinoa! Happy to see your workaround. Did not realize/remember the joinExisting restriction on the outer dimension. It would seem the NCEI TimeSeries "gold standard" is unattainable (at least in TDS) so perhaps we will have to settle for silver here. ;) Curious what implications this has had for others in IOOS?: @lukecampbell, @mwengren, @kwilcox, @rsignell-usgs.

kwilcox commented 6 years ago

@pacioos I feel your pain! At one point there was some work done to get the DSG classes as first-class FeatureTypes in NCJ/THREDDS but that work has stalled over the past few years. We have resorted to either aggregating the netCDF files ourselves and serving the single aggregated file through THREDDS or exporting each individual measured parameter to its own netCDF files (so there is only a time and depth dimensions) and aggregating over time.

Example single parameter: http://thredds.secoora.org/thredds/dodsC/secoora/sensors/edu_usf_marine_comps_c10/data/relative_humidity.nc.html

lesserwhirls commented 6 years ago

Hi @pacioos! Yes, the CF DSG support for aggregations in the TDS is fairly limited. Improving this is near the top of our post-5.0 release list of priorities. Some work has been done in 5.0 to address this, but it's half-baked (John started to work on this before he left).

cofinoa commented 6 years ago

@lesserwhirls, any chance to look at the "half-backed" solution?

lesserwhirls commented 6 years ago

@cofinoa - the work is done in the CDM layer. For example, there is the DsgFeatureCollection class, as well as DsgCollectionImpl.java. From what I understand, a lot of the point related featureCollection code is built around a streaming model, and being able to efficiently read through a dataset using the time dimension is key (which is part of the reason for the need of the time dimension to be the first dimension). I'm not sure how far along the DsgFeatureCollection code gets away from the need to stream the entire dataset, but that's what would need to be done.

pacioos commented 6 years ago

@cofinoa thanks again for your logicalReduce example. For reference, while the proposed solution appeared to produce the full aggregation (>5K obs listed), I can only subset up to the number of obs in the penultimate file (96 obs).

TDS catalog: http://oos.soest.hawaii.edu/thredds/dodsC/hioos/wqb/wqb05agg.html

OPeNDAP ASCII request for first 96 obs successful: http://oos.soest.hawaii.edu/thredds/dodsC/hioos/wqb/wqb05agg.ascii?temperature[0:1:0][0:1:95]

Anything beyond that fails: http://oos.soest.hawaii.edu/thredds/dodsC/hioos/wqb/wqb05agg.ascii?temperature[0:1:0][0:1:96]

Error {
    code = 500;
    message = "NcSDArray InvalidRangeException=Bad range ending value at index 0 == 96";
};
cofinoa commented 6 years ago

@pacioos please could you attach the complete aggregation are you doing?

pacioos commented 6 years ago
    <!-- WQB-05: Pelekane Bay: aggregation: -->

    <dataset name="Water Quality Buoy 05: Pelekane Bay (aggregated)"
             ID="WQB05agg"
             urlPath="hioos/wqb/wqb05agg"
             collectionType="TimeSeries">

      <metadata inherited="true">

        <authority>org.pacioos</authority>
        <serviceName>nonGridAggServicesWithSOS</serviceName>
        <dataType>Station</dataType>
        <dataFormat>NetCDF</dataFormat>

        <contributor role="Principal Investigator">Dr. Steven Colbert</contributor>
        <contributor role="Data Manager">Dr. Jim Potemra</contributor>
        <creator>
          <name>Dr. Steven Colbert</name>
          <contact url="http://www2.hawaii.edu/~colberts" email="colberts@hawaii.edu" />
        </creator>
        <publisher>
          <name>Pacific Islands Ocean Observing System (PacIOOS)</name>
          <contact url="http://pacioos.org" email="info@pacioos.org" />
        </publisher>

        <project>Pacific Islands Ocean Observing System (PacIOOS)</project>
        <project>U.S. Integrated Ocean Observing System (IOOS)</project>

        <keyword vocabulary="GCMD Science Keywords">Earth Science &gt; Oceans &gt; Ocean Chemistry &gt; Chlorophyll</keyword>
        <keyword vocabulary="GCMD Science Keywords">Earth Science &gt; Oceans &gt; Ocean Chemistry &gt; Oxygen</keyword>
        <keyword vocabulary="GCMD Science Keywords">Earth Science &gt; Oceans &gt; Ocean Optics &gt; Turbidity</keyword>
        <keyword vocabulary="GCMD Science Keywords">Earth Science &gt; Oceans &gt; Ocean Temperature &gt; Water Temperature</keyword>
        <keyword vocabulary="GCMD Science Keywords">Earth Science &gt; Oceans &gt; Salinity/Density &gt; Salinity</keyword>
        <keyword vocabulary="GCMD Science Keywords">Earth Science &gt; Oceans &gt; Water Quality</keyword>

        <variables vocabulary="CF Standard Name Table v39">
          <variable name="chlorophyll" vocabulary_name="mass_concentration_of_chlorophyll_in_sea_water" units="kilogram meter-3">chlorophyll</variable>
          <variable name="latitude" vocabulary_name="latitude" units="degrees_north">latitude</variable>
          <variable name="longitude" vocabulary_name="longitude" units="degrees_east">longitude</variable>
          <variable name="oxygen" vocabulary_name="mass_concentration_of_oxygen_in_sea_water" units="kilogram meter-3">dissolved oxygen</variable>
          <variable name="oxygen_saturation" vocabulary_name="fractional_saturation_of_oxygen_in_sea_water" units="1">oxygen saturation</variable>
          <variable name="ph" vocabulary_name="sea_water_ph_reported_on_total_scale" units="1">pH</variable>
          <variable name="salinity" vocabulary_name="sea_water_salinity" units="1e-3">salinity</variable>
          <variable name="temperature" vocabulary_name="sea_water_temperature" units="Celsius">temperature</variable>
          <variable name="time" vocabulary_name="time" units="minutes since 2018-01-01T00:00:00Z">time</variable>
          <variable name="turbidity" vocabulary_name="sea_water_turbidity" units="NTU">turbidity</variable>
        </variables>

        <property name="institution" value="University of Hawaii at Hilo, Department of Marine Science" />

        <documentation type="summary">The water quality buoys are part of the Pacific Islands Ocean Observing System (PacIOOS) and are designed to measure a variety of ocean parameters at fixed points. WQB-05 is located in Pelekane Bay near Kawaihae Harbor on the west side of the Big Island. Continuous sampling of this outflow area provides a record of baseline conditions of the chemical and biological environment for comparison when there are pollution events such as storm runoff or a sewage spill.</documentation>

        <documentation xlink:href="http://pacioos.org/water/wqbuoy-pelekane/"
                       xlink:title="PacIOOS Water Quality Buoy WQB-05: Pelekane Bay" />

        <documentation type="rights">These data were generated as part of an academic research project, and the principal investigator, Steven Colbert (colberts@hawaii.edu), asks to be informed of intent for scientific use and appropriate acknowledgment given in any publications arising therefrom. The data are provided free of charge, without warranty of any kind.</documentation>

        <documentation type="funding">The Pacific Islands Ocean Observing System (PacIOOS), funded through the National Oceanic and Atmospheric Administration (NOAA), is a Regional Association within the U.S. Integrated Ocean Observing System (IOOS). PacIOOS is coordinated by the University of Hawaii School of Ocean and Earth Science and Technology (SOEST).</documentation>

        <property name="viewer" value="http://pacioos.org/voyager/index.html?b=19.961409%2C-155.915308%2C20.08979%2C-155.753088&amp;t=p&amp;o=qual:1::p0WQB-05p1,PacIOOS Voyager (Google Maps API)" />
        <property name="viewer2" value="http://pacioos.org/water/wqbuoy-pelekane/,PacIOOS website" />
        <property name="viewer3" value="http://oos.soest.hawaii.edu/erddap/tabledap/wqb05_agg.graph,ERDDAP" />

        <geospatialCoverage zpositive="up">
          <northsouth>
            <start>20.02415</start>
            <size>0</size>
            <units>degrees_north</units>
          </northsouth>
          <eastwest>
            <start>-155.8285</start>
            <size>0</size>
            <units>degrees_east</units>
          </eastwest>
          <updown>
            <start>-1</start>
            <size>0</size>
            <units>m</units>
          </updown>
          <name vocabulary="GCMD Location Keywords">Continent &gt; North America &gt; United States Of America &gt; Hawaii</name>
          <name vocabulary="GCMD Location Keywords">Ocean &gt; Pacific Ocean &gt; Central Pacific Ocean &gt; Hawaiian Islands &gt; Big Island</name>
          <name vocabulary="GCMD Location Keywords">Ocean &gt; Pacific Ocean &gt; Central Pacific Ocean &gt; Hawaiian Islands &gt; Hawaii Island &gt; Kawaihae</name>
          <name vocabulary="GCMD Location Keywords">Ocean &gt; Pacific Ocean &gt; Central Pacific Ocean &gt; Hawaiian Islands &gt; Hawaii Island &gt; Pelekane</name>
        </geospatialCoverage>

        <timeCoverage>
          <start>2018-03-10T02:00:00Z</start>
          <end>present</end>
          <resolution>15 minutes</resolution>
        </timeCoverage>

      </metadata>

      <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
        <attribute name="date_created" value="2018-03-28"/>
        <attribute name="date_issued" value="2018-03-28"/>
        <attribute name="date_modified" value="2018-03-28"/>
        <attribute name="date_metadata_modified" value="2018-03-28"/>
        <attribute name="time_coverage_start" value="2018-03-10T02:00:00Z"/>
        <attribute name="time_coverage_end" value="present"/>

        <!-- This scalar is applied at file level, not aggregation: -->
        <remove type="variable" name="qc_flag" />

        <!--
        DEPRECATED: CANNOT USE joinExisting TO AGGREGATE ON INNER "obs"
        DIMENSION. CAN ONLY USE NcML AGGREAGATION ON THE OUTER-MOST DIMENSION,
        WHICH IN OUR CASE IS "timeseries". SO, WE LOGICALLY REDUCE THE
        DIMENSIONS AND THEN joinNew ON "timeseries" AS WORKAROUND; SEE:
        https://github.com/Unidata/thredds/issues/1095#issuecomment-386796768

        <aggregation dimName="obs" type="joinExisting" recheckEvery="1 minute">
          <scan location="/export/lawelawe1/wqb/wqb05/agg/" suffix=".nc" />
        </aggregation>
        -->

        <!-- Outer aggregation: restore the "timeseries" dimension: -->

        <aggregation type="joinNew" dimName="timeseries">

          <!-- Enumerate all the variables to be restored: -->

          <variableAgg name="time" />
          <variableAgg name="temperature" />
          <variableAgg name="salinity" />
          <variableAgg name="oxygen" />
          <variableAgg name="oxygen_saturation" />
          <variableAgg name="turbidity" />
          <variableAgg name="chlorophyll" />
          <variableAgg name="ph" />
          <variableAgg name="temperature_raw" />
          <variableAgg name="salinity_raw" />
          <variableAgg name="oxygen_raw" />
          <variableAgg name="oxygen_saturation_raw" />
          <variableAgg name="turbidity_raw" />
          <variableAgg name="chlorophyll_raw" />
          <variableAgg name="ph_raw" />
          <variableAgg name="temperature_dm_qd" />
          <variableAgg name="salinity_dm_qd" />
          <variableAgg name="oxygen_dm_qd" />
          <variableAgg name="oxygen_saturation_dm_qd" />
          <variableAgg name="turbidity_dm_qd" />
          <variableAgg name="chlorophyll_dm_qd" />
          <variableAgg name="ph_dm_qd" />
          <variableAgg name="temperature_qc_gap" />
          <variableAgg name="temperature_qc_syn" />
          <variableAgg name="temperature_qc_loc" />
          <variableAgg name="temperature_qc_rng" />
          <variableAgg name="temperature_qc_clm" />
          <variableAgg name="temperature_qc_spk" />
          <variableAgg name="temperature_qc_rtc" />
          <variableAgg name="temperature_qc_flt" />
          <variableAgg name="temperature_qc_mvr" />
          <variableAgg name="temperature_qc_atn" />
          <variableAgg name="temperature_qc_nbr" />
          <variableAgg name="temperature_qc_crv" />
          <variableAgg name="temperature_qc_din" />
          <variableAgg name="salinity_qc_gap" />
          <variableAgg name="salinity_qc_syn" />
          <variableAgg name="salinity_qc_loc" />
          <variableAgg name="salinity_qc_rng" />
          <variableAgg name="salinity_qc_clm" />
          <variableAgg name="salinity_qc_spk" />
          <variableAgg name="salinity_qc_rtc" />
          <variableAgg name="salinity_qc_flt" />
          <variableAgg name="salinity_qc_mvr" />
          <variableAgg name="salinity_qc_atn" />
          <variableAgg name="salinity_qc_nbr" />
          <variableAgg name="salinity_qc_crv" />
          <variableAgg name="salinity_qc_din" />
          <variableAgg name="oxygen_qc_gap" />
          <variableAgg name="oxygen_qc_syn" />
          <variableAgg name="oxygen_qc_loc" />
          <variableAgg name="oxygen_qc_rng" />
          <variableAgg name="oxygen_qc_clm" />
          <variableAgg name="oxygen_qc_spk" />
          <variableAgg name="oxygen_qc_rtc" />
          <variableAgg name="oxygen_qc_flt" />
          <variableAgg name="oxygen_qc_mvr" />
          <variableAgg name="oxygen_qc_atn" />
          <variableAgg name="oxygen_qc_nbr" />
          <variableAgg name="oxygen_saturation_qc_gap" />
          <variableAgg name="oxygen_saturation_qc_syn" />
          <variableAgg name="oxygen_saturation_qc_loc" />
          <variableAgg name="oxygen_saturation_qc_rng" />
          <variableAgg name="oxygen_saturation_qc_clm" />
          <variableAgg name="oxygen_saturation_qc_spk" />
          <variableAgg name="oxygen_saturation_qc_rtc" />
          <variableAgg name="oxygen_saturation_qc_flt" />
          <variableAgg name="oxygen_saturation_qc_mvr" />
          <variableAgg name="oxygen_saturation_qc_atn" />
          <variableAgg name="oxygen_saturation_qc_nbr" />
          <variableAgg name="turbidity_qc_gap" />
          <variableAgg name="turbidity_qc_syn" />
          <variableAgg name="turbidity_qc_loc" />
          <variableAgg name="turbidity_qc_rng" />
          <variableAgg name="turbidity_qc_clm" />
          <variableAgg name="turbidity_qc_spk" />
          <variableAgg name="turbidity_qc_rtc" />
          <variableAgg name="turbidity_qc_flt" />
          <variableAgg name="turbidity_qc_mvr" />
          <variableAgg name="turbidity_qc_atn" />
          <variableAgg name="turbidity_qc_nbr" />
          <variableAgg name="chlorophyll_qc_gap" />
          <variableAgg name="chlorophyll_qc_syn" />
          <variableAgg name="chlorophyll_qc_loc" />
          <variableAgg name="chlorophyll_qc_rng" />
          <variableAgg name="chlorophyll_qc_clm" />
          <variableAgg name="chlorophyll_qc_spk" />
          <variableAgg name="chlorophyll_qc_rtc" />
          <variableAgg name="chlorophyll_qc_flt" />
          <variableAgg name="chlorophyll_qc_mvr" />
          <variableAgg name="chlorophyll_qc_atn" />
          <variableAgg name="chlorophyll_qc_nbr" />
          <variableAgg name="ph_qc_gap" />
          <variableAgg name="ph_qc_syn" />
          <variableAgg name="ph_qc_loc" />
          <variableAgg name="ph_qc_rng" />
          <variableAgg name="ph_qc_clm" />
          <variableAgg name="ph_qc_spk" />
          <variableAgg name="ph_qc_rtc" />
          <variableAgg name="ph_qc_flt" />
          <variableAgg name="ph_qc_mvr" />
          <variableAgg name="ph_qc_atn" />
          <variableAgg name="ph_qc_nbr" />

          <!-- Inner aggregation: remove the "timeseries" dimension: -->

          <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">-->
            <aggregation type="joinExisting" dimName="obs">

              <!-- Enumerate all the variables to be reduced and aggregated on
                   "obs" dimension: -->

              <variable name="time">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="temperature">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="salinity">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="oxygen">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="oxygen_saturation">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="turbidity">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="chlorophyll">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="ph">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="temperature_raw">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="salinity_raw">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="oxygen_raw">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="oxygen_saturation_raw">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="turbidity_raw">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="chlorophyll_raw">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="ph_raw">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="temperature_dm_qd">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="salinity_dm_qd">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="oxygen_dm_qd">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="oxygen_saturation_dm_qd">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="turbidity_dm_qd">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="chlorophyll_dm_qd">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="ph_dm_qd">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="temperature_qc_gap">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="temperature_qc_syn">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="temperature_qc_loc">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="temperature_qc_rng">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="temperature_qc_clm">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="temperature_qc_spk">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="temperature_qc_rtc">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="temperature_qc_flt">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="temperature_qc_mvr">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="temperature_qc_atn">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="temperature_qc_nbr">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="temperature_qc_crv">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="temperature_qc_din">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="salinity_qc_gap">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="salinity_qc_syn">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="salinity_qc_loc">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="salinity_qc_rng">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="salinity_qc_clm">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="salinity_qc_spk">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="salinity_qc_rtc">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="salinity_qc_flt">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="salinity_qc_mvr">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="salinity_qc_atn">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="salinity_qc_nbr">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="salinity_qc_crv">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="salinity_qc_din">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="oxygen_qc_gap">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="oxygen_qc_syn">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="oxygen_qc_loc">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="oxygen_qc_rng">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="oxygen_qc_clm">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="oxygen_qc_spk">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="oxygen_qc_rtc">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="oxygen_qc_flt">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="oxygen_qc_mvr">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="oxygen_qc_atn">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="oxygen_qc_nbr">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="oxygen_saturation_qc_gap">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="oxygen_saturation_qc_syn">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="oxygen_saturation_qc_loc">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="oxygen_saturation_qc_rng">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="oxygen_saturation_qc_clm">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="oxygen_saturation_qc_spk">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="oxygen_saturation_qc_rtc">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="oxygen_saturation_qc_flt">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="oxygen_saturation_qc_mvr">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="oxygen_saturation_qc_atn">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="oxygen_saturation_qc_nbr">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="turbidity_qc_gap">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="turbidity_qc_syn">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="turbidity_qc_loc">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="turbidity_qc_rng">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="turbidity_qc_clm">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="turbidity_qc_spk">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="turbidity_qc_rtc">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="turbidity_qc_flt">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="turbidity_qc_mvr">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="turbidity_qc_atn">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="turbidity_qc_nbr">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="chlorophyll_qc_gap">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="chlorophyll_qc_syn">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="chlorophyll_qc_loc">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="chlorophyll_qc_rng">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="chlorophyll_qc_clm">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="chlorophyll_qc_spk">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="chlorophyll_qc_rtc">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="chlorophyll_qc_flt">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="chlorophyll_qc_mvr">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="chlorophyll_qc_atn">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="chlorophyll_qc_nbr">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="ph_qc_gap">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="ph_qc_syn">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="ph_qc_loc">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="ph_qc_rng">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="ph_qc_clm">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="ph_qc_spk">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="ph_qc_rtc">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="ph_qc_flt">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="ph_qc_mvr">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="ph_qc_atn">
                <logicalReduce dimNames="timeseries"/>
              </variable>
              <variable name="ph_qc_nbr">
                <logicalReduce dimNames="timeseries"/>
              </variable>

              <!-- The previous definitions will be applied to all files/netcdf
                   *before* aggregation: -->

              <scan location="/export/lawelawe1/wqb/wqb05/agg/" suffix=".nc" />

            </aggregation>
          </netcdf>
        </aggregation>

      </netcdf>

    </dataset>
cofinoa commented 6 years ago

@pacioos the joinExisting aggregation it's creating a ghost obs variable associated to the obs dimension (coordinate variable) with missing data and/or no defined values and/or incomplete arrays produciong unexpected errors in the TDS.

Remove the variable and it should work, Just add: <remove type="obs" name="obs" /> to the one already you have.

@lesserwhirls I think this a bug. The joinExisting aggregation is on existing dimension and it should no required existing variable. Of course if the variable exists, the coordValues attribute and timeChangeUnits features related to the variable values should be considered.

pacioos commented 6 years ago

Good call, @cofinoa! Many thanks for your insightful comments. I added the following two lines prior to my joinNew aggregation to zap the ghost obs and timeseries coordinate variables (note: type="variable").

        <!-- Avoid "ghost" coordinate variables generated by aggregations below: -->
        <remove type="variable" name="timeseries" />
        <remove type="variable" name="obs" />

The ghostbusting works and I can now successfully access the full variable array, beyond just the first 96 obs:

http://oos.soest.hawaii.edu/thredds/dodsC/hioos/wqb/wqb05agg.ascii?temperature[0:1:0][0:1:5740]

Extra hoops to jump through but happy to have a workaround. The dataset now appears and performs like the original DSG TimeSeries in my NetCDF files. Gracias!