Unidata / thredds

THREDDS Data Server v4.6
https://www.unidata.ucar.edu/software/tds/v4.6/index.html
266 stars 179 forks source link

Odd new variable appearing in this joinExisting aggregation #451

Open rsignell-usgs opened 8 years ago

rsignell-usgs commented 8 years ago

We have a bunch of netcdf granules here: http://geoport-dev.whoi.edu/thredds/catalog/usgs/data2/rsignell/gdrive/nsf-alpha/Data/WHOI-HFRadar-Data-Sets/catalog.html

that we are aggregating with a very simple NcML that joins along the time dimension t:

<?xml version="1.0" encoding="UTF-8"?>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" >
  <aggregation dimName="t" type="joinExisting">
    <scan location="." regExp="^WHOI_ISLE_HFR_[0-9]{4}_[0-9]{2}_[0-9]{2}_800mgrid_1000mrad_20-Feb-2016\.nc$"/>
  </aggregation>
</netcdf>

The resulting aggregation dataset here: http://geoport-dev.whoi.edu/thredds/dodsC/usgs/data2/rsignell/gdrive/nsf-alpha/Data/WHOI-HFRadar-Data-Sets/00_dir_HFR_agg.ncml.html seems to work fine, but we noticed that the aggregation has acquired an odd new variable t that didn't exist before.

This new variable t has some rather strange values: http://geoport-dev.whoi.edu/thredds/dodsC/usgs/data2/rsignell/gdrive/nsf-alpha/Data/WHOI-HFRadar-Data-Sets/00_dir_HFR_agg.ncml.ascii?t[0:1:47]

Is this because the time coordinate variable datetime has a different name than the time dimension t?

Is this expected behavior?

rsignell-usgs commented 8 years ago

Yikes, this is even stranger. There are two different dimensions, both called t:

http://geoport-dev.whoi.edu/thredds/dodsC/usgs/data2/rsignell/gdrive/nsf-alpha/Data/WHOI-HFRadar-Data-Sets/00_dir_HFR_agg.ncml.dds

gives

Dataset {
    Float64 Longitude[lon = 39];
    Float64 Latitude[lat = 36];
    Int32 t[t = 48];
    Float64 datetime[t = 4416];
    Float64 East_vel[t = 4416][lat = 36][lon = 39];
    Float64 North_vel[t = 4416][lat = 36][lon = 39];
    Float64 East_err[t = 4416][lat = 36][lon = 39];
    Float64 North_err[t = 4416][lat = 36][lon = 39];
    Float64 err_cov[t = 4416][lat = 36][lon = 39];
    Float64 total_err[t = 4416][lat = 36][lon = 39];
} usgs/data2/rsignell/gdrive/nsf-alpha/Data/WHOI-HFRadar-Data-Sets/00_dir_HFR_agg.ncml;
rsignell-usgs commented 8 years ago

If I try renaming the dimension

<?xml version="1.0" encoding="UTF-8"?>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" >
 <dimension name="datetime" orgName="t"/>
  <aggregation dimName="t" type="joinExisting">
    <scan location="." regExp="^WHOI_ISLE_HFR_[0-9]{4}_[0-9]{2}_[0-9]{2}_800mgrid_1000mrad_20-Feb-2016\.nc$"/>
  </aggregation>
</netcdf>

then the DDS looks better, but still I have that strange t variable with it's own t dimension: http://geoport-dev.whoi.edu/thredds/dodsC/usgs/data2/rsignell/gdrive/nsf-alpha/Data/WHOI-HFRadar-Data-Sets/00_dir_HFR_agg2.ncml.dds

Dataset {
    Float64 Longitude[lon = 39];
    Float64 Latitude[lat = 36];
    Int32 t[t = 48];
    Float64 datetime[datetime = 4416];
    Float64 East_vel[datetime = 4416][lat = 36][lon = 39];
    Float64 North_vel[datetime = 4416][lat = 36][lon = 39];
    Float64 East_err[datetime = 4416][lat = 36][lon = 39];
    Float64 North_err[datetime = 4416][lat = 36][lon = 39];
    Float64 err_cov[datetime = 4416][lat = 36][lon = 39];
    Float64 total_err[datetime = 4416][lat = 36][lon = 39];
} usgs/data2/rsignell/gdrive/nsf-alpha/Data/WHOI-HFRadar-Data-Sets/00_dir_HFR_agg2.ncml;
rsignell-usgs commented 8 years ago

And it looks like renaming the dimension causes failure. Godiva2 gives http://geoport-dev.whoi.edu/thredds/godiva2/godiva2.html?server=http://geoport-dev.whoi.edu/thredds/wms/usgs/data2/rsignell/gdrive/nsf-alpha/Data/WHOI-HFRadar-Data-Sets/00_dir_HFR_agg2.ncml throws a error getting data from server while if I don't rename, the aggregation is okay: http://geoport-dev.whoi.edu/thredds/godiva2/godiva2.html?server=http://geoport-dev.whoi.edu/thredds/wms/usgs/data2/rsignell/gdrive/nsf-alpha/Data/WHOI-HFRadar-Data-Sets/00_dir_HFR_agg.ncml

rsignell-usgs commented 8 years ago

@cwardgar , should I send e-mail to thredds support referencing this ticket? Not sure of the protocol anymore...

dopplershift commented 8 years ago

No. Check your files. I just dumped them all via opendap and one actually HAS a variable called 't'. (I'll get filename in a second.)

dopplershift commented 8 years ago

Might have spoken too soon... (stupid ncml files also get opened by opendap...)

lesserwhirls commented 8 years ago

@rsignell-usgs - I think github works best for potential bugs like this. Can you try renaming the dimension inside the aggregation? That worked for me using a few of the files from the server.

lesserwhirls commented 8 years ago

According to the ncml agg docs:

https://www.unidata.ucar.edu/software/thredds/v4.6/netcdf-java/ncml/Aggregation.html

"Variables of the same name (in different files) are connected along their existing, outer dimension, called the aggregation dimension. A coordinate variable must exist for the dimension."

So, in the example you have above renaming the dimension, the coordinate variable t is being created for each file, and then you rename the dimension overall. If you rename the dimension inside the aggregation, the the variable datetime is recognized as the coordinate variable and no new variable t is created.

<?xml version="1.0" encoding="UTF-8"?>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" >
  <aggregation dimName="datetime" type="joinExisting">
    <scan location="." regExp="^WHOI_ISLE_HFR_[0-9]{4}_[0-9]{2}_[0-9]{2}_800mgrid_1000mrad_20-Feb-2016\.nc$"/>
    <dimension name="datetime" orgName="t" />
  </aggregation>
</netcdf>
lesserwhirls commented 8 years ago

Now here is a fun one...if I tell the joinExisting to use dimName="datetime" instead of dimName="t" and change nothing else, like so:

<?xml version="1.0" encoding="UTF-8"?>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" >
  <aggregation dimName="datetime" type="joinExisting">
    <scan location="." regExp="^WHOI_ISLE_HFR_[0-9]{4}_[0-9]{2}_[0-9]{2}_800mgrid_1000mrad_20-Feb-2016\.nc$"/>
  </aggregation>
</netcdf>

then things work as well. Since the dimension datetime does not exists but the variable does, the ncml agg creates the new dimension...I don't think it should be doing that!

lesserwhirls commented 8 years ago

In short, I think this is a bug.

Here is what I think might be going on: even though the variable datetime is a coordinate variable, the NCML aggregation code does not pick up the variable datetime as the coordinate variable corresponding to the dimension t, and as such and creates a new variable t to match the name of the dimension t.

@JohnLCaron - any of this ringing a bell, or brining back memories of NcML aggregation nightmares?

JohnLCaron commented 8 years ago

get rid of its screwing things up.

doesnt need to have same name,

:coordinates = "Longitude Latitude datetime";

works fine

JohnLCaron commented 8 years ago

not sure what this "variable t that didn't exist before" is yet. so i may be wrong, we may be assuming existence or coordinate variable.

JohnLCaron commented 8 years ago

if so, try

<variable name="t" orgName="datetime" />

not

<dimension name="datetime" orgName="t" />
rsignell-usgs commented 8 years ago

@lesserwhirls , awesome! I didn't know I could rename the dimension inside the aggregation tag! And I agree that creating a time coordinate variable with the same name as the dimension is a bug, since one already exists (it just isn't named the same as the dimension).

rsignell-usgs commented 8 years ago

Here's the resulting very nice aggregation, using @lesserwhirls https://github.com/Unidata/thredds/issues/451#issuecomment-189032093 solution above:

http://geoport-dev.whoi.edu/thredds/dodsC/usgs/data2/rsignell/gdrive/nsf-alpha/Data/WHOI-HFRadar-Data-Sets/00_dir_HFR_agg3.ncml.html

rsignell-usgs commented 8 years ago

@lesserwhirls should we leave this open until the bug is fixed or do you want to introduce another issue that actually more closely addresses the issue?

lesserwhirls commented 8 years ago

I think we should just leave this open, and I will try to summarize things. However, it looks like @JohnLCaron had something slightly different in mind (rather than renaming the dimension), but I'm not sure if there is a difference between renaming the dimension or renaming the variable.

So @JohnLCaron, here is what I understand the situation is:

Each netCDF file has a dimension t and an associated coordinate variable datetime, which is correctly picked up by the CoordSys tab in ToolsUI as a coordinate variable. When you do a joinExisting NcML agg, the aggregation creates a new variable t, with what appears to be a default value set for all values in the array. I assume this is done to match the dimension t, even though the (dimension <---> coordinate variable) pair is t and datetime. Note that the docs for the joinExisting NcML agg state that we assume a coordinate variable for the joinExisting dimension exists.

I'm thinking that the NcML agg does not pick up on the fact that the (dimension <---> coordinate variable ) pair is t and datetime, and thus it does not need to create a new variable t. To me, this indicates a bug in that the joinExisting agg is actually requiring that a variable with the same name as the join dimension exits, rather than a corresponding coordinate variable exists for the join dimension (as stated in the docs). If we rename the dimension t to datetime, or rename the variable datetime to t, things work as expected.

rsignell-usgs commented 8 years ago

@lesserwhirls this is exactly how I understand the situation as well. :smile_cat:

JohnLCaron commented 8 years ago

agree

On Fri, Feb 26, 2016 at 9:35 AM, Rich Signell notifications@github.com wrote:

@lesserwhirls https://github.com/lesserwhirls this is exactly how I understand the situation as well. [image: :smile_cat:]

— Reply to this email directly or view it on GitHub https://github.com/Unidata/thredds/issues/451#issuecomment-189361458.