Unidata / thredds

THREDDS Data Server v4.6
https://www.unidata.ucar.edu/software/tds/v4.6/index.html
265 stars 179 forks source link

NCML Time axis aggregation with missing / time gaps #1361

Closed Akshay-Hegde closed 3 years ago

Akshay-Hegde commented 3 years ago

Hi I got few data files, which is hourly data, but there exists gap

( Across multiple files ) Time Min and Max are : 16-NOV-2010 06:00 to 22-OCT-2019 10:00

Depth Min and Max are : -104 to 1044.5

  1. Wanted to explore is there any way to generate time and depth axis dynamically based on files Time and depth axis min and max value, and re-grid variables inside.
  2. How missing values can be filled for these gaps ?

I read few available resource online such as logical view etc, but no luck so far no success.

Content of my ncml

$ cat test.ncml 
<?xml version="1.0" encoding="UTF-8"?>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
  <aggregation dimName="Time" type="joinExisting">
          <scan location="/path/to/test-ncml" regExp="f*\.nc$" subdirs="true" timeUnitsChange="true"/>
  </aggregation>
</netcdf>

Sample

$ ncdump -h sample.nc 
netcdf sample {
dimensions:
    Lon = 1 ;
    Lat = 1 ;
    Depth = 62 ;
    Time = 8214 ;
variables:
    float Lon(Lon) ;
        Lon:long_name = "Longitude" ;
        Lon:units = "Degree_East" ;
    float Lat(Lat) ;
        Lat:long_name = "Latitude" ;
        Lat:units = "Degree_Nort" ;
    float Depth(Depth) ;
        Depth:long_name = "Depth (m)" ;
        Depth:units = "meters" ;
        Depth:bin_size = 8. ;
        Depth:Center_first_bin = 17.7600002288818 ;
        Depth:blanking_distance = 7.03999996185303 ;
    float Time(Time) ;
        Time:long_name = "Time" ;
        Time:units = "hours" ;
        Time:time_origin = "16-NOV-2010 06:00:00" ;
    float u_1205(Time, Depth, Lat, Lon) ;
        u_1205:name = "u" ;
        u_1205:long_name = "Eastward Velocity" ;
        u_1205:missing_value = 99999.f ;
        u_1205:_FillValue = 1.e+35f ;
        u_1205:units = "cm/s" ;
    float v_1206(Time, Depth, Lat, Lon) ;
        v_1206:name = "v" ;
        v_1206:long_name = "Northward Velocity" ;
        v_1206:missing_value = 99999.f ;
        v_1206:_FillValue = 1.e+35f ;
        v_1206:units = "cm/s" ;
}
lesserwhirls commented 3 years ago

Unfortunately, this scenario is far too complex for NcML and will require writing some custom code join these together using the attributes in the way you describe.

lesserwhirls commented 3 years ago

It might be worth reaching out the the general netCDF users email list to see if anyone has done a similar task in the past: netcdfgroup@unidata.ucar.edu

Akshay-Hegde commented 3 years ago

Unfortunately, this scenario is far too complex for NcML and will require writing some custom code join these together using the attributes in the way you describe.

Thank you as you said it seems impossible to do this in NCML, however I can do this using Ferret, cdo, matlab etc. But wanted to discover possibilities especially dynamically as variables are same across files, time and depth axis Min and Max values are known.

Is there any document on NCML ? Official website docs seems not updated.

cofinoa commented 3 years ago

Hi, @Akshay-Hegde

I have not understand your scenario. Do you have a set of samples input nc files? what is the expected result for that sample?

Regards

cofinoa commented 3 years ago

.... what means re-gridding, interpolation?

Akshay-Hegde commented 3 years ago

Hi, @Akshay-Hegde

I have not understand your scenario. Do you have a set of samples input nc files? what is the expected result for that sample?

Regards

@cofinoa

Hi as you can see below got f1.nc to f5.nc total 5 files spanning from 2012-10-13 10:00:00 to 2019-10-22 08:40:00. But in between there are some data gaps as observation did not take place during this missing period. All these files belongs to single location so latitude and longitude dimension length is just 1.

datetime-start, datetime-end, file
2012-10-13 10:00:00, 2013-11-24 08:00:00, f1.nc  
            --- here data missing -- between 2013-11-24 08:01:00 to 2013-11-24 00:59:00
2013-11-24 13:00:00, 2014-11-15 15:00:00, f2.nc  
            --- here data missing -- between 2014-11-15 15:01:00 to 2016-11-16 00:59:00
2016-11-16 13:00:00, 2017-10-08 08:00:00, f3.nc  
            --- again here
2017-10-08 12:20:00, 2018-10-10 08:20:00, f4.nc  
            --- again here
2018-10-10 14:40:00, 2019-10-22 08:40:00, f5.nc  

.... what means re-gridding, interpolation?

Yes

Also these files time axis are like this

# for f1.nc  
float Time(Time) ;
        Time:long_name = "Time" ;
        Time:units = "hours" ;
        Time:time_origin = "13-OCT-2012 10:00:00" ;

# for f2.nc  
float Time(Time) ;
        Time:long_name = "Time" ;
        Time:units = "hours" ;
        Time:time_origin = "24-NOV-2013 13:00:00" ;
$ ncdump -h f1.nc | grep 'Time = '
    Time = 9767 ;
$ ncdump -h f2.nc | grep 'Time = '
    Time = 8547 ;
$ ncdump -h f3.nc | grep 'Time = '
    Time = 7820 ;

netcdf file:test.ncml {
  dimensions:
    Lon = 1;
    Lat = 1;
    Depth = 63;
    Time = 26134;           /*  Here I need available plus missing hours */

 float Time(Time=26134);
      :long_name = "Time";
      :units = "hours";
      :time_origin = "13-OCT-2012 10:00:00"; /* Here I need  hours since 01-JAN-1970 00:00:00 */
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
   <aggregation dimName="Time" type="joinExisting">

            <netcdf location="f1.nc" />
            <!-- Missing Here
                 How to define virtual dataset and fill with missing values 
                 between 2013-11-24 09:00:00 to 2013-11-24 00:00:00
            -->
            <netcdf location="f2.nc" />

            <!--- missing Here
                between 2014-11-15 16:00:00 to 2016-11-16 00:00:00
            -->

            <!-- can we convert these files time axis values relative to hours since 1970-JAN-01 00:00:00 -->
            <netcdf location="f3.nc" />

   </aggregation>
</netcdf>

@cofinoa Please find sample data : Google Drive