Closed peterdesmet closed 8 years ago
I've uploaded a new example which will closely resemble the operational profile. One thing that will still likely change is the ordering of the datasetXX
subfolders in the hdf5-tree.
The description of the hdf5 file has also been updated
@adokter I don't seem to find a lot of metadata in these files. Is that correct? For instance dataset1/data1/what
seems to be empty, so I'm not sure what the values mean.
You can use the h5dump
command to list the structure of the hdf5 file. If you use HDFView GUI, you will find the what
folder to be empty, because attributes do not show up by default. You have to do a Show properties
on the folder.
Here what I get for the Jabbeke radar:
adriaan@MacAdriaan:~/git/ODIM-hdf5-test/vp$ h5dump -n 1 bejab_vp_20151009T0000Z.h5
HDF5 "bejab_vp_20151009T0000Z.h5" {
FILE_CONTENTS {
group /
attribute /Conventions
group /dataset1
group /dataset1/data1
dataset /dataset1/data1/data
group /dataset1/data1/what
attribute /dataset1/data1/what/gain
attribute /dataset1/data1/what/nodata
attribute /dataset1/data1/what/offset
attribute /dataset1/data1/what/quantity
attribute /dataset1/data1/what/undetect
group /dataset1/data10
dataset /dataset1/data10/data
group /dataset1/data10/what
attribute /dataset1/data10/what/gain
attribute /dataset1/data10/what/nodata
attribute /dataset1/data10/what/offset
attribute /dataset1/data10/what/quantity
attribute /dataset1/data10/what/undetect
group /dataset1/data11
dataset /dataset1/data11/data
group /dataset1/data11/what
attribute /dataset1/data11/what/gain
attribute /dataset1/data11/what/nodata
attribute /dataset1/data11/what/offset
attribute /dataset1/data11/what/quantity
attribute /dataset1/data11/what/undetect
group /dataset1/data12
dataset /dataset1/data12/data
group /dataset1/data12/what
attribute /dataset1/data12/what/gain
attribute /dataset1/data12/what/nodata
attribute /dataset1/data12/what/offset
attribute /dataset1/data12/what/quantity
attribute /dataset1/data12/what/undetect
group /dataset1/data13
dataset /dataset1/data13/data
group /dataset1/data13/what
attribute /dataset1/data13/what/gain
attribute /dataset1/data13/what/nodata
attribute /dataset1/data13/what/offset
attribute /dataset1/data13/what/quantity
attribute /dataset1/data13/what/undetect
group /dataset1/data14
dataset /dataset1/data14/data
group /dataset1/data14/what
attribute /dataset1/data14/what/gain
attribute /dataset1/data14/what/nodata
attribute /dataset1/data14/what/offset
attribute /dataset1/data14/what/quantity
attribute /dataset1/data14/what/undetect
group /dataset1/data15
dataset /dataset1/data15/data
group /dataset1/data15/what
attribute /dataset1/data15/what/gain
attribute /dataset1/data15/what/nodata
attribute /dataset1/data15/what/offset
attribute /dataset1/data15/what/quantity
attribute /dataset1/data15/what/undetect
group /dataset1/data2
dataset /dataset1/data2/data
group /dataset1/data2/what
attribute /dataset1/data2/what/gain
attribute /dataset1/data2/what/nodata
attribute /dataset1/data2/what/offset
attribute /dataset1/data2/what/quantity
attribute /dataset1/data2/what/undetect
group /dataset1/data3
dataset /dataset1/data3/data
group /dataset1/data3/what
attribute /dataset1/data3/what/gain
attribute /dataset1/data3/what/nodata
attribute /dataset1/data3/what/offset
attribute /dataset1/data3/what/quantity
attribute /dataset1/data3/what/undetect
group /dataset1/data4
dataset /dataset1/data4/data
group /dataset1/data4/what
attribute /dataset1/data4/what/gain
attribute /dataset1/data4/what/nodata
attribute /dataset1/data4/what/offset
attribute /dataset1/data4/what/quantity
attribute /dataset1/data4/what/undetect
group /dataset1/data5
dataset /dataset1/data5/data
group /dataset1/data5/what
attribute /dataset1/data5/what/gain
attribute /dataset1/data5/what/nodata
attribute /dataset1/data5/what/offset
attribute /dataset1/data5/what/quantity
attribute /dataset1/data5/what/undetect
group /dataset1/data6
dataset /dataset1/data6/data
group /dataset1/data6/what
attribute /dataset1/data6/what/gain
attribute /dataset1/data6/what/nodata
attribute /dataset1/data6/what/offset
attribute /dataset1/data6/what/quantity
attribute /dataset1/data6/what/undetect
group /dataset1/data7
dataset /dataset1/data7/data
group /dataset1/data7/what
attribute /dataset1/data7/what/gain
attribute /dataset1/data7/what/nodata
attribute /dataset1/data7/what/offset
attribute /dataset1/data7/what/quantity
attribute /dataset1/data7/what/undetect
group /dataset1/data8
dataset /dataset1/data8/data
group /dataset1/data8/what
attribute /dataset1/data8/what/gain
attribute /dataset1/data8/what/nodata
attribute /dataset1/data8/what/offset
attribute /dataset1/data8/what/quantity
attribute /dataset1/data8/what/undetect
group /dataset1/data9
dataset /dataset1/data9/data
group /dataset1/data9/what
attribute /dataset1/data9/what/gain
attribute /dataset1/data9/what/nodata
attribute /dataset1/data9/what/offset
attribute /dataset1/data9/what/quantity
attribute /dataset1/data9/what/undetect
group /how
attribute /how/beamwidth
attribute /how/clutterMap
attribute /how/comment
attribute /how/maxazim
attribute /how/maxrange
attribute /how/minazim
attribute /how/minrange
attribute /how/rcs_bird
attribute /how/sd_vvp_thresh
attribute /how/task
attribute /how/task_args
attribute /how/task_version
attribute /how/wavelength
group /what
attribute /what/date
attribute /what/object
attribute /what/source
attribute /what/time
attribute /what/version
group /where
attribute /where/height
attribute /where/interval
attribute /where/lat
attribute /where/levels
attribute /where/lon
attribute /where/maxheight
attribute /where/minheight
}
}
I tested the data-format of the hdf5 file in the following notebook: https://github.com/enram/infrastructure/blob/master/hdf5_handling/hdf5_check.ipynb
metadata can be easily extracted using existing python packages, such as h5py of pytables. Functions to extract the metadata/data are written as testcase in the notebook.
However, I'm just wondering why the individual files are all so small, which seems a drawback of using hdf5, having the capability of using with very large datasets. It connects to the discussion of collecting the data in a dbase or not. As individual files are so small, the creation of download-service or the ability to make queries will be an iteration over a lot of files when we only store metadata in a dbase.
I quickly checked, and maybe it could be interesting to think about some aggregation? In the notebook (last section), an aggregation to daily level is performed and the compliance with pandas provides an easy (daily) query. At the same time, doing this effort, we could just opt to put the data in a dbase.
The main reason for using ODIM hdf5 is that its the standard data exchange format at the meteorological datahubs, we simply need to conform to that specification if we want to integrate the bird product generation in the datahub.
The processing at the datahub is a simple file in - file out (so the source data is large, but the bird product is very small)
I have nothing against aggregation, but it can't happen at the meteorological datahub - would have to be implemented by us as an extra step
Thanks for clarifying and I certainly do not want to question the original hdf5-file fomat. It is an important condition that should be taken into account. In function of the data-products (download request, services) for the users, there are different options. I'm wondering if it would be most useful to put all the data and metadata in a dbase for the download service or only the metadata as suggested in issue #4? In the latter situation, when a query is provided by a user, it would result in collecting data from a high number of small hdf5-files. What would you suggest?
I think having a dbase with all the data that can be queried would be very handy, but the decision also depends on feasibility, time and resources we have at the moment. Arguments against it are:
In an earlier discussion @peterdesmet suggested a directory tree and a service that shows what's available is therefore the more feasible option. But a dbase would be handier, because it's more flexible and you don't have to deal with a multitude of files anymore (which is a problem that remains, even if you aggregate to days).
@bartaelterman @stijnvanhoey