NCPP / ocgis

OpenClimateGIS is a set of geoprocessing and calculation tools for CF-compliant climate datasets.
Other
70 stars 19 forks source link

Unicode characters in netCDF metadata #446

Closed huard closed 7 years ago

huard commented 7 years ago
/home/dhuard/.miniconda3/envs/OPG2/lib/python2.7/site-packages/ocgis-2.0.0.dev1-py2.7.egg/ocgis/variable/attributes.pyc in write_attributes_to_netcdf_object(self, target)
     40                 continue
     41             if isinstance(v, six.string_types):
---> 42                 v = str(v)
     43             if k == 'axis' and isinstance(v, six.string_types):
     44                 # HACK: Axis writing was causing a strange netCDF failure.
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 196: ordinal not in range(128)

Here is the metadata dump for the culprit file.

netcdf tas_Amon_NorESM1-ME_historical_r1i1p2_185001-200512 {
dimensions:
    time = UNLIMITED ; // (1872 currently)
    lat = 96 ;
    lon = 144 ;
    bnds = 2 ;
variables:
    double time(time) ;
        time:bounds = "time_bnds" ;
        time:units = "days since 1850-01-01 00:00:00" ;
        time:calendar = "noleap" ;
        time:axis = "T" ;
        time:long_name = "time" ;
        time:standard_name = "time" ;
    double time_bnds(time, bnds) ;
    double lat(lat) ;
        lat:bounds = "lat_bnds" ;
        lat:units = "degrees_north" ;
        lat:axis = "Y" ;
        lat:long_name = "latitude" ;
        lat:standard_name = "latitude" ;
    double lat_bnds(lat, bnds) ;
    double lon(lon) ;
        lon:bounds = "lon_bnds" ;
        lon:units = "degrees_east" ;
        lon:axis = "X" ;
        lon:long_name = "longitude" ;
        lon:standard_name = "longitude" ;
    double lon_bnds(lon, bnds) ;
    double height ;
        height:units = "m" ;
        height:axis = "Z" ;
        height:positive = "up" ;
        height:long_name = "height" ;
        height:standard_name = "height" ;
    float tas(time, lat, lon) ;
        tas:standard_name = "air_temperature" ;
        tas:long_name = "Near-Surface Air Temperature" ;
        tas:units = "K" ;
        tas:original_name = "TREFHT" ;
        tas:cell_methods = "time: mean" ;
        tas:cell_measures = "area: areacella" ;
        tas:history = "2014-03-19T21:15:28Z altered by CMOR: Treated scalar dimension: \'height\'. 2014-03-19T21:15:28Z altered by CMOR: replaced missing value flag (1e+20) with standard missing value (1e+20). 2014-03-19T21:15:28Z altered by CMOR: Converted type from \'d\' to \'f\'." ;
        tas:coordinates = "height" ;
        tas:missing_value = 1.e+20f ;
        tas:_FillValue = 1.e+20f ;
        tas:associated_files = "baseURL: http://cmip-pcmdi.llnl.gov/CMIP5/dataLocation gridspecFile: gridspec_atmos_fx_NorESM1-ME_historical_r0i0p0.nc areacella: areacella_fx_NorESM1-ME_historical_r0i0p0.nc" ;

// global attributes:
        :institution = "Norwegian Climate Centre" ;
        :institute_id = "NCC" ;
        :experiment_id = "historical" ;
        :source = "NorESM1-ME 2011  atmosphere: CAM-Oslo (CAM4-Oslo-noresm-ver1_cmip5-r139, f19L26);  ocean: MICOM (MICOM-noresm-ver1_cmip5-r139, gx1v6L53);  ocean biogeochemistry: HAMOCC (HAMOCC-noresm-ver1_cmip5-r139, gx1v6L53);  sea ice: CICE (CICE4-noresm-ver1_cmip5-r139);  land: CLM (CLM4-noresm-ver1_cmip5-r139)" ;
        :model_id = "NorESM1-ME" ;
        :forcing = "GHG, SA, Oz, Sl, Vl, BC, OC" ;
        :parent_experiment_id = "piControl" ;
        :parent_experiment_rip = "r1i1p2" ;
        :branch_time = 0. ;
        :contact = "Please send any requests or bug reports to noresm-ncc@met.no." ;
        :comment = "The p2 configuration of NorESM-1ME includes discharge of riverine nutrients but is otherwise identical with the p1 configuration. Reference for riverine nutrients implementation: Bernard, C. Y., Dürr, H. H., Heinze, C., Segschneider, J., and Maier-Reimer, E.: Contribution of riverine nutrients to the silicon biogeochemistry of the global ocean \u2013 a model study, Biogeosciences, 8, 551-564, doi:10.5194/bg-8-551-2011, 2011." ;
        :initialization_method = 1 ;
        :physics_version = 2 ;
        :tracking_id = "20d94737-1ce8-4933-aa89-0348e893e5b1" ;
        :product = "output" ;
        :experiment = "historical" ;
        :frequency = "mon" ;
        :creation_date = "2014-03-19T21:15:28Z" ;
        :history = "2014-03-19T21:15:28Z CMOR rewrote data to comply with CF standards and CMIP5 requirements." ;
        :Conventions = "CF-1.4" ;
        :project_id = "CMIP5" ;
        :table_id = "Table Amon (01 February 2012) 81f919710c21dca8a1753166d5bac090" ;
        :title = "NorESM1-ME model output prepared for CMIP5 historical" ;
        :parent_experiment = "pre-industrial control" ;
        :modeling_realm = "atmos" ;
        :realization = 1 ;
        :cmor_version = "2.8.3" ;
}
bekozi commented 7 years ago

Nice find. Pushed a fix that should address this issue. The attribute write now catches unicode errors and sends the errant value directly to the netCDF library raising a warning in the process. Let me know if you think it should be handled differently. These errors were caught in some of the metadata code, but not here for some reason.

huard commented 7 years ago

Looks good. Thanks !