enram / data-repository

Data quality assessment
https://enram.github.io/data-repository/
MIT License
3 stars 1 forks source link

Transform data to CSV #3

Closed peterdesmet closed 8 years ago

peterdesmet commented 8 years ago

Do we really need to build this? And if so, how do we keep it simple?

I think CSVs are the most understandable and consumable format for everyone. We don't need to transform if the bird algorithm would directly output the data as CSV, but from what I understood: 1) HDF5 allows to also embed metadata and is thus better as a archive format 2) HDF5 is the default format used on BALTRAD?

Questions

adokter commented 8 years ago

Hi Peter,

I uploaded an example hdf5 bird profile file with an obsolete structure. The new format will look like this

adokter commented 8 years ago

I'm not convinced we need to convert hdf5 to csv. My preference would be to load hdf5 files directly into a relational database, and not provide access to a directory tree of CSVs.

peterdesmet commented 8 years ago

@adokter, we've come to similar conclusions. I'll remove the work package and close this issue.

amirhouieh commented 6 years ago

Hi, I have been looking for birds migration, for a visualisation project. While this data repository sounds the most reliable source, but it is almost impossible to get a hand on its data (particularly reading the data and not accessing it). I bet you have your reasons to use HDF5 and I respect it, but I think if you want to make this data available to more people, you should consider either presenting it in a more regular format (such as JSON, CSV, SQL, ...) or at least give a better and more explanatory insight of data structure. I have made a crawler which download the latest .h5 file of each Radar from S3, but now I have serious problem and in order to just parse this data I have to change my entire code-base ecosystem (from Java/Kotlin to python). Even assuming I am ok with doing this, I still have no idea for example what the parsed data represents. I would appreciate if you can point me to some documentation where I can understand the data structure. A use-case for me would be for example migration flow of specie X in country X at month 3.

stijnvanhoey commented 6 years ago

With respect to the HDF5 data format, this is a community agreed ODIM bird profile format specification on which further study/visualisations/... can be built.

Functionalities for data access, plotting and data analysis based on the S3 repository is supported by the BioRad R package. The manuals of the package (called vignettes in the R worls) do explain how to use the package to download, process and visualize the data. Another source, is the vp-processing manual written by @peterdesmet. He can also tell you more about the existing migration visualisation.

At the same time, @adokter is currently submitting a paper about the BioRad package wich will provide more scientific background and are we refactoring the BioRad package (discussed mainly on this issue; I'm currently doing the conversions of the package in the rename-functions branch). Notice that the current manuals and tutorials will be outdated (using deprecated functions of the package), but we try to get it all integrated in the package in the coming weeks/months.

peterdesmet commented 6 years ago

@amirhouieh in addition to @stijnvanhoey comment: are you looking to visualize bird migration data derived from weather radars specifically or bird migration data in general? If the latter, then GPS tracking data might be more straightforward to visualize, e.g. https://doi.org/10.15468/02omly

amirhouieh commented 6 years ago

@stijnvanhoey thanks for the reply, I understand. @peterdesmet I am looking for bird migration in general. Thanks for the link, will look into it.