Open sadielbartholomew opened 4 years ago
Good idea. Getting the netCDF information is straight forward, as it's reported by the netCDF4
library. I'm not sure if this information is so readily available for PP/UM files, but a code tweak would make it so if not.
However, we have to be careful, as aggregation can combine fields from files with different data models. This is why the get_filenames
methods return a set of file names rather than a single string. Perhaps this method could be modified to return a dictionary whose keys are the file names with corresponding values of the file data model?
Thanks for the insight. It sounds fairly straightforward, in that case.
However, we have to be careful, as aggregation can combine fields from files with different data models. This is why the get_filenames methods return a set of file names rather than a single string. Perhaps this method could be modified to return a dictionary whose keys are the file names with corresponding values of the file data model?
Yes, good thinking, that sounds like the most Pythonic way to manage the aggregation context.
We support specification of a netCDF file format to write out to, but as far as I can see (I may be missing something obvious) there is no way in cf-python to determine, for read-in fields, the data model underlying the source file, e.g. the type of netCDF (classic, 64-bit offset, netCDF-4, CFA varaiants, etc.) else the
.pp
&.ff
proprietary formats with any variants, as that does not appear to be encoded in the metadata.I think users may be interested in this information, for example to know immediately based on the format whether there will be groups, etc. without having to inspect the group structure. So, similar to the
source
method on a field providing detail on the method of production of the original data, I propose asource_fmt
orsource_storage
(or similar) method to return that information, assuming it is not overly difficult to determine that information when the file is read-in.Some utility similar to that provided by shell-command inspection of the first four bytes of the file, and/or the
ncdump
-k
option (based on the netCDF docs FAQ section 'How can I tell which format a netCDF file uses?'):to report the file format corresponding to fields read-in from a given file could be useful. For example,
f.source_storage
providing the format as a named string such as those listed asfmt
forcf.write
in the case of netCDF.