aodn / IMOS-hackathon

Code emerging from the 2024 AODN Hackathon
GNU General Public License v3.0
0 stars 3 forks source link

"meta-data as data" // document the needed extra columns for future AODN ocean in-situ observations `parquet` archives #37

Open Thomas-Moore-Creative opened 5 months ago

Thomas-Moore-Creative commented 5 months ago

see: https://github.com/aodn/IMOS-hackathon/issues/26#issuecomment-2099614669

In moving from NetCDF to more cloud optimised datasets like parquet we need to address the changes in how "meta-data" is addressed. The bottom line is that without the global attributes available in NetCDF we'll need to cary over, for each record in the dataset, some of this "meta-data" as "data" columns for each spatial and time point record.

The assumption is that while a duplication of bytes in the file and "wasteful" of storage that the real-world impact of the extra size in terms of resource costs or access time won't matter. (??)

Thomas-Moore-Creative commented 5 months ago

@BecCowley we should grab the headers from some of Chris's CODA headers here as a start?

BecCowley commented 5 months ago

The list we want (if the information is available). We can add to this if anything obvious is missing.

lat lon date/time probe_type recorder country database origin Project name platform/instrument type vehicle (eg vessel) Institute