Unidata / tds

THREDDS Data Server
https://docs.unidata.ucar.edu/tds/5.0/userguide/index.html
BSD 3-Clause "New" or "Revised" License
66 stars 28 forks source link

TDS puts wrong variable attributes #503

Open rkouznetsov opened 5 months ago

rkouznetsov commented 5 months ago
Manifest-Version: 1.0
Implementation-Title: THREDDS Data Server (TDS)
Implementation-Version: 5.5-SNAPSHOT
Implementation-Vendor-Id: edu.ucar
Implementation-Vendor: UCAR/Unidata
Implementation-URL: https://www.unidata.ucar.edu/software/tds/
Created-By: Gradle 6.9.1
Build-Jdk: 11.0.18
Built-By: ubuntu
Built-On: 2024-03-19T05:11:37+0000

When I request (you might need to substitute recent date)

ncks -v 'cnc_POLLEN_GRASS.*' -d height,0 -d time,2 https://thredds.silam.fmi.fi/thredds/dodsC/silam_europe_pollen_v5_9_1/runs/silam_europe_pollen_v5_9_1_RUN_2024-06-01T00:00:00Z out.nc

I get a file with a variable that has attributes:

        float cnc_POLLEN_GRASS_m32(time, height, rlat, rlon) ;
                cnc_POLLEN_GRASS_m32:units = "number/m3" ;
                cnc_POLLEN_GRASS_m32:long_name = "Concentration in air POLLEN_GRASS_m32" ;
                cnc_POLLEN_GRASS_m32:_FillValue = -9.99999e+14f ;
                cnc_POLLEN_GRASS_m32:substance_name = "POLLEN_GRASS" ;
                cnc_POLLEN_GRASS_m32:silam_amount_unit = "number" ;
                cnc_POLLEN_GRASS_m32:mode_name = "" ;
                cnc_POLLEN_GRASS_m32:mode_distribution_type = "FIXED_DIAMETER" ;
                cnc_POLLEN_GRASS_m32:mode_nominal_diameter = "32.0000000           um" ;
                cnc_POLLEN_GRASS_m32:fix_diam_mode_min_diameter = "32.0000000           um" ;
                cnc_POLLEN_GRASS_m32:fix_diam_mode_max_diameter = "32.0000000           um" ;
                cnc_POLLEN_GRASS_m32:fix_diam_mode_mean_diameter = "32.0000000           um" ;
                cnc_POLLEN_GRASS_m32:grid_mapping = "rp" ;
                cnc_POLLEN_GRASS_m32:cell_methods = "time: mean" ;
                cnc_POLLEN_GRASS_m32:_ChunkSizes = 1, 1, 459, 549 ;
                cnc_POLLEN_GRASS_m32:coordinates = "time_run time height rlat rlon " ;

There is acouple of issues with them:

  1. _ChunkSizes must not be there: It is netcdf3 file.
  2. coordinates refers to non-existing dimension time_run

These attributes confuse further processing tools, such as cdo and nco, so I have to do some magic to rectify the file

ncatted -a _ChunkSizes,,d,,  -a coordinates,,d,,  out.nc out1.nc

Is there any way to prevent TDS from creating these wrong attributes?

Thank you!

BR, Rostislav

ethanrd commented 5 months ago

Hi @rkouznetsov - It looks like the files that the TDS is serving are netCDF-4 files (catalog). From the OPeNDAP information (DDS)(DAS), the coordinates attribute lists "time_run" in the netCDF-4 files. So, I don't think that is TDS issue but instead part of the data being served.

The OPeNDAP information also includes the nc-4 _ChunkSizes attribute from the netCDF-4 files being served by the TDS. Exposing that information to the requestor is, I believe, pretty standard practice. Removing the _ChunkSizes attributes when writing a new nc-3 file from the OPeNDAP would require a client application built to handle that. I expect many general purpose tools don't currently handle that. What version of NCO ncks are you using? Have you tried telling it to write a netCDF-4 file? You might also try using nccopy and see how it behaves when writing nc-3 and nc-4 files from that dataset.

DennisHeimbigner commented 5 months ago

Is there any chance of getting a copy of the underlying netcdf file on the server?

rkouznetsov commented 5 months ago

Thank you, @DennisHeimbigner ! The files are available from here https://thredds.silam.fmi.fi/thredds/catalog/silam_europe_pollen_v5_9_1/files/catalog.html They have no run_time variable or dimension in them..

rkouznetsov commented 5 months ago

Thnak you, @ethanrd ! It _ChunkSizes in netcdf3 is indeed nco issue then.. I use vanilla ncks from Ubuntu 22.04:

$ ncks --version
NCO netCDF Operators version 5.0.6 "Alanis" built by buildd on lcy02-amd64-027 at Feb  2 2022 18:28:56
ncks version 5.0.6

I'll try to check the results with latest nco and post the bug there.

rkouznetsov commented 5 months ago

FYI Just filed https://github.com/nco/nco/issues/284 about _ChunkSizes.

rkouznetsov commented 5 months ago

nccopy seems not to work at all...

$ nccopy -v https://thredds.silam.fmi.fi/thredds/dodsC/silam_europe_pollen_v5_9_1/runs/silam_europe_pollen_v5_9_1_RUN_2024-06-05T00:00:00Z out.nc
NetCDF: One or more variable sizes violate format constraints
Location: file ; line 2121

I am not sure what does that error message mean...

rkouznetsov commented 5 months ago

Nope. _ChunkSizes is not an NCO issue. Despite it is a good idea to somehow expose _ChunkSizes over OpenDAP, so that a user can select the best acquisition sequence, also the info that it is an ephemeral attribute should be conveyed to the client.

DennisHeimbigner commented 5 months ago

The apparent underlying file -- SILAM-POLLEN-europe_v5_9_1_2024060200.nc4 -- is a netcdf-4 file.