CDAT / cdms

8 stars 10 forks source link

UnicodeDecodeError attempting to open an NCO-generated netcdf file #442

Open durack1 opened 2 years ago

durack1 commented 2 years ago

Describe the bug cdms 3.1.5 fails with a UnicodeDecodeError when attempting to open an NCO-generated netcdf file

To Reproduce Steps to reproduce the behavior:

In [2]: import cdms2 as cdm
In [3]: f = '/p/css03/esgf_publish/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/abrupt-4xCO2/r3i1p1f2/Eday/rivo/gn/v20181012/rivo_Eday_CNRM-CM6-1_abrupt-4xCO2_r3i1p1f2_gn_18500501-18591231.nc'
In [4]: fH = cdm.open(f)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 11: invalid start byte
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/tmp/ipykernel_70886/1508437202.py", line 1, in <module>
    fH = cdm.open(f)
  File "/home/durack1/anaconda3/envs/cdms315spy515cart020/lib/python3.9/site-packages/cdms2/dataset.py", line 523, in openDataset
    file = CdmsFile(path, mode, hostObj)
  File "/home/durack1/anaconda3/envs/cdms315spy515cart020/lib/python3.9/site-packages/cdms2/dataset.py", line 1295, in __init__
    self._file_ = Cdunif.CdunifFile(path, mode)
SystemError: <built-in function CdunifFile> returned a result with an error set

Expected behavior I would have expected cdms could open the file, ncdump -h works fine

Screenshots or traceback reproducible steps above should provide enough info

Desktop (please complete the following information):

durack1 commented 2 years ago

The same issue occurs for the /p/css03/esgf_publish/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/abrupt-4xCO2/r3i1p1f2/Eday/rivo/gn/v20181012/rivo_Eday_CNRM-CM6-1_abrupt-4xCO2_r3i1p1f2_gn_18500501-18591231.nc file

durack1 commented 2 years ago

And /p/css03/esgf_publish/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/abrupt-4xCO2/r3i1p1f2/Emon/wtd/gn/v20181012/wtd_Emon_CNRM-CM6-1_abrupt-4xCO2_r3i1p1f2_gn_185005-185912.nc, if I find more, I'll just edit this comment.

Also /p/css03/esgf_publish/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/abrupt-4xCO2/r3i1p1f2/fx/areacellr/gn/v20181012/areacellr_fx_CNRM-CM6-1_abrupt-4xCO2_r3i1p1f2_gn.nc /p/css03/esgf_publish/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/abrupt-4xCO2/r4i1p1f2/Eday/rivo/gn/v20181012/rivo_Eday_CNRM-CM6-1_abrupt-4xCO2_r4i1p1f2_gn_18500701-18591231.nc /p/css03/esgf_publish/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/abrupt-4xCO2/r4i1p1f2/Emon/wtd/gn/v20181012/wtd_Emon_CNRM-CM6-1_abrupt-4xCO2_r4i1p1f2_gn_185007-185912.nc /p/css03/esgf_publish/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/abrupt-4xCO2/r4i1p1f2/fx/areacellr/gn/v20181012/areacellr_fx_CNRM-CM6-1_abrupt-4xCO2_r4i1p1f2_gn.nc /p/css03/esgf_publish/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/abrupt-4xCO2/r1i1p1f2/Eday/rivo/gn/v20180705/rivo_Eday_CNRM-CM6-1_abrupt-4xCO2_r1i1p1f2_gn_19500101-19991231.nc /p/css03/esgf_publish/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/abrupt-4xCO2/r1i1p1f2/Eday/rivo/gn/v20180705/rivo_Eday_CNRM-CM6-1_abrupt-4xCO2_r1i1p1f2_gn_19500101-19991231.nc /p/css03/esgf_publish/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/abrupt-4xCO2/r1i1p1f2/Emon/wtd/gn/v20180705/wtd_Emon_CNRM-CM6-1_abrupt-4xCO2_r1i1p1f2_gn_185001-199912.nc /p/css03/esgf_publish/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/abrupt-4xCO2/r1i1p1f2/fx/areacellr/gn/v20180705/areacellr_fx_CNRM-CM6-1_abrupt-4xCO2_r1i1p1f2_gn.nc

Plus more, I gave up adding to this list, should be enough to debug the same behaviour

jypeter commented 2 years ago

This reminds me of #432 ...

durack1 commented 2 years ago

@jypeter I think you're right, I wonder if a case of adding string.encode('utf-8') might be a way of getting around the problem? Although I think this is buried in the c code that calls the netcdf-c, somewhere in Src/Cdunifmodule.c, not sure there is an equivalent .encode in c?

Just dropping an ncdump of one of the problem files below, I can't see the character issue after a quick scan:

ncdump -h /p/css03/esgf_publish/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/abrupt-4xCO2/r3i1p1f2/Eday/rivo/gn/v20181012/rivo_Eday_CNRM-CM6-1_abrupt-4xCO2_r3i1p1f2_gn_18500501-18591231.nc
netcdf rivo_Eday_CNRM-CM6-1_abrupt-4xCO2_r3i1p1f2_gn_18500501-18591231 {
dimensions:
    lat = 360 ;
    lon = 720 ;
    time = UNLIMITED ; // (3532 currently)
    axis_nbounds = 2 ;
variables:
    float lat(lat) ;
        lat:axis = "Y" ;
        lat:standard_name = "latitude" ;
        lat:long_name = "Latitude" ;
        lat:units = "degrees_north" ;
    float lon(lon) ;
        lon:axis = "X" ;
        lon:standard_name = "longitude" ;
        lon:long_name = "Longitude" ;
        lon:units = "degrees_east" ;
    double time(time) ;
        time:axis = "T" ;
        time:standard_name = "time" ;
        time:long_name = "Time axis" ;
        time:calendar = "gregorian" ;
        time:units = "days since 1850-01-01 00:00:00" ;
        time:time_origin = "1850-01-01 00:00:00" ;
        time:bounds = "time_bounds" ;
    double time_bounds(time, axis_nbounds) ;
    float rivo(time, lat, lon) ;
        rivo:long_name = "River Discharge" ;
        rivo:units = "m3 s-1" ;
        rivo:online_operation = "average" ;
        rivo:cell_methods = "area: mean where land time: mean" ;
        rivo:interval_operation = "1800 s" ;
        rivo:interval_write = "1 d" ;
        rivo:_FillValue = 1.e+20f ;
        rivo:missing_value = 1.e+20f ;
        rivo:coordinates = "" ;
        rivo:standard_name = "water_flux_to_downstream" ;
        rivo:description = "water_flux_from_upstream" ;
        rivo:history = "none" ;
        rivo:cell_measures = "area: areacellr" ;

// global attributes:
        :name = "/scratch/work/voldoire/outputs/CMIP6/DECK/CNRM-CM6-1_abrupt-4xCO2_r3i1p1f2/18500501/rivo_Eday_CNRM-CM6-1_abrupt-4xCO2_r3i1p1f2_gn_%start_date%-%end_date%" ;
        :Conventions = "CF-1.7 CMIP-6.2" ;
        :creation_date = "2018-07-23T14:10:00Z" ;
        :description = "DECK: abrupt-4xCO2" ;
        :title = "CNRM-CM6-1 model output prepared for CMIP6 / CMIP abrupt-4xCO2" ;
        :activity_id = "CMIP" ;
        :contact = "contact.cmip@meteo.fr" ;
        :data_specs_version = "01.00.21" ;
        :dr2xml_version = "1.13" ;
        :experiment_id = "abrupt-4xCO2" ;
        :experiment = "abrupt quadrupling of CO2" ;
        :external_variables = "areacellr" ;
        :forcing_index = 2 ;
        :frequency = "day" ;
        :further_info_url = "https://furtherinfo.es-doc.org/CMIP6.CNRM-CERFACS.CNRM-CM6-1.abrupt-4xCO2.none.r3i1p1f2" ;
        :grid = "regular 1/2? lat-lon grid" ;
        :grid_label = "gn" ;
        :nominal_resolution = "50 km" ;
        :initialization_index = 1 ;
        :institution_id = "CNRM-CERFACS" ;
        :institution = "CNRM (Centre National de Recherches Meteorologiques, Toulouse 31057, France), CERFACS (Centre Europeen de Recherche et de Formation Avancee en Calcul Scientifique, Toulouse 31057, France)" ;
        :license = "CMIP6 model data produced by CNRM-CERFACS is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (https://creativecommons.org/licenses). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file) and at http://www.umr-cnrm.fr/cmip6/. The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law." ;
        :mip_era = "CMIP6" ;
        :parent_experiment_id = "piControl" ;
        :parent_mip_era = "CMIP6" ;
        :parent_activity_id = "CMIP" ;
        :parent_source_id = "CNRM-CM6-1" ;
        :parent_time_units = "days since 1850-01-01 00:00:00" ;
        :parent_variant_label = "r3i1p1f2" ;
        :branch_method = "standard" ;
        :branch_time_in_parent = 0. ;
        :branch_time_in_child = 0. ;
        :physics_index = 1 ;
        :product = "model-output" ;
        :realization_index = 3 ;
        :realm = "land" ;
        :references = "http://www.umr-cnrm.fr/cmip6/references" ;
        :source = "CNRM-CM6-1 (2017):  aerosol: prescribed monthly fields computed by TACTIC_v2 scheme atmos: Arpege 6.3 (T127; Gaussian Reduced with 24572 grid points in total distributed over 128 latitude circles (with 256 grid points per latitude circle between 30degN and 30degS reducing to 20 grid points per latitude circle at 88.9degN and 88.9degS); 91 levels; top level 78.4 km) atmosChem: OZL_v2 land: Surfex 8.0c ocean: Nemo 3.6 (eORCA1, tripolar primarily 1deg; 362 x 294 longitude/latitude; 75 levels; top grid cell 0-1 m) seaIce: Gelato 6.1" ;
        :source_id = "CNRM-CM6-1" ;
        :source_type = "AOGCM" ;
        :sub_experiment_id = "none" ;
        :sub_experiment = "none" ;
        :table_id = "Eday" ;
        :variable_id = "rivo" ;
        :variant_label = "r3i1p1f2" ;
        :EXPID = "CNRM-CM6-1_abrupt-4xCO2_r3i1p1f2" ;
        :CMIP6_CV_version = "cv=6.2.3.0-7-g2019642" ;
        :dr2xml_md5sum = "92ddb3d0d8ce79f498d792fc8e559dcf" ;
        :xios_commit = "1442-shuffle" ;
        :nemo_gelato_commit = "49095b3accd5d4c_6524fe19b00467a" ;
        :arpege_minor_version = "6.3.2" ;
        :tracking_id = "hdl:21.14100/21b7ff0f-63f7-4702-a965-fa94b6fa6ad1" ;
        :history = "Tue Jul 24 19:23:12 2018: ncatted -O -a tracking_id,global,m,c,hdl:21.14100/21b7ff0f-63f7-4702-a965-fa94b6fa6ad1 /scratch/work/voldoire/outputs/CMIP6/DECK/CNRM-CM6-1_abrupt-4xCO2_r3i1p1f2/assembled/rivo_Eday_CNRM-CM6-1_abrupt-4xCO2_r3i1p1f2_gn_18500501-18591231.nc\nnone" ;
        :NCO = "\"4.5.5\"" ;
}