aodn / compliance-checker

Python tool to check your datasets vs compliance standards. Forked to include AODN specific modifications.
Apache License 2.0
1 stars 0 forks source link

imos netcdf filename checker #1

Closed lbesnard closed 9 years ago

lbesnard commented 9 years ago

@fxmzb123 I found some issues with the new imos-file-name test

see the following two files :

The test fail for both files

Running Compliance Checker on the dataset from: IMOS_SRS-OC_F_20140312T030831Z_VMQ9273_FV01_DALEC_20140312T032541Z_C-20140629T231726Z.nc
Traceback (most recent call last):
  File "/home/user/compliance-checker/cchecker.py", line 27, in <module>
    sys.exit(main())
  File "/home/user/compliance-checker/cchecker.py", line 21, in main
    args.criteria)
  File "/home/user/compliance-checker/compliance_checker/runner.py", line 45, in run_checker
    score_groups = cs.run(ds, ds_loc, *checker_names)
  File "/home/user/compliance-checker/compliance_checker/suite.py", line 84, in run
    dsp                = checker.load_datapair(ds, ds_loc)
  File "/home/user/compliance-checker/compliance_checker/base.py", line 78, in load_datapair
    data_object = NetCDFDogma('ds', self.beliefs(), ds, namespaces=namespaces)
  File "/usr/local/lib/python2.7/dist-packages/wicken/dogma.py", line 136, in __call__
    obj = super(MetaReligion, clsType).__call__(religion, beliefs, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/wicken/netcdf_dogma.py", line 45, in __init__
    root = parse_nc_dataset_as_etree(dataObject)
  File "/usr/local/lib/python2.7/dist-packages/petulantbear/netcdf_etree.py", line 443, in parse_nc_dataset_as_etree
    dataset2ncml_buffer(dataset,output)
  File "/usr/local/lib/python2.7/dist-packages/petulantbear/netcdf2ncml.py", line 238, in dataset2ncml_buffer
    parse_att(output,(attname,dataset.getncattr(attname)), indent)
  File "/usr/local/lib/python2.7/dist-packages/petulantbear/netcdf2ncml.py", line 124, in parse_att
    attvalue=sanatize(att[1])
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 143: ordinal not in range(128)

This file works though :

cheers

fxmzb123 commented 9 years ago

Hi lbesnard,

This issue is caused by the files contain the non-ASCII characters in attributes. I have resolved this by set default encoding to UTF-8. The happens for all checkers and it needs to be fixed in IOOS checker as well. I will create a separated branch and send a pull-request.

Thanks, Ming

danfruehauf commented 9 years ago

How about using the system's locale for that? It'd be much more accurate than hardcoding it to UTF-8.

lbesnard commented 9 years ago

How about using the system's locale for that? It'd be much more accurate than hardcoding it to UTF-8.

from what I understand, the issue is file based, and not env based

danfruehauf commented 9 years ago

And do the files declare their encoding?

fxmzb123 commented 9 years ago

The default encoding for Python is ASCII, Python can get system default encoding by sys.getfilesystemencoding(), which can be further set as default in Python through: reload(sys) sys.setdefaultencoding(sys.getfilesystemencoding())

danfruehauf commented 9 years ago

We're not quite interested in the filesystem settings, but more in either what NetCDF is publishing (if it does at all), or alternatively, use the system's locale for that.

fxmzb123 commented 9 years ago

The default encoding for netcdf-python library is ''utf-8' and this should be used, I believe.

danfruehauf commented 9 years ago

That's fine too. It's unlikely we'll use anything other than UTF-8 in the foreseeable future. However as best practice I think it's good if it can be overriden by environment - should it ever be required.

mhidas commented 9 years ago

It seems this is not actually an issue with the filename checker, and the encoding problem has been reported to IOOS (https://github.com/ioos/compliance-checker/issues/108), so I think we can close this.