G-Node / nixpy

Python library for NIX
https://readthedocs.org/projects/nixpy
Other
20 stars 26 forks source link

UnicodeDecodeError when setting tag units with unit of data array #249

Open ajkswamy opened 7 years ago

ajkswamy commented 7 years ago

Hi

Here is an example with nixio v1.3.0

import nixio as nix
import numpy as np

nixFile = nix.File.open('test.h5')
blk = nixFile.create_block('TestBlock', 'Test')
da = blk.create_data_array('TestDA', 'Test', data=np.random.rand(50))
da.unit = 'mV'
dim = da.append_sampled_dimension(1)
dim.unit = 's'

tag = blk.create_tag('TestTag', 'Test', position=[10])
tag.extent = [10]
tag.references.append(da)
tag.units = [da.dimensions[0].unit]
nixFile.close()

Here is the Traceback,

  Traceback (most recent call last):
  File "tmp/nixioUnicodeBug.py", line 14, in <module>
    tag.units = [da.dimensions[0].unit]
  File "/home/aj/intel/intelpython27/envs/GJEMS/lib/python2.7/site-packages/nixio/pycore/tag.py", line 51, in units
    u = util.units.sanitizer(u)
  File "/home/aj/intel/intelpython27/envs/GJEMS/lib/python2.7/site-packages/nixio/pycore/util/units.py", line 66, in sanitizer
    replace(micro, "u").replace(mugr, "u")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)

I could get around it with

tag.units = [str(da.dimensions[0].unit)]

Is this expected behavior? Thanks.

s0nskar commented 7 years ago

This error occurs generally when given data does not follow proper utf-8 encoding, so you can take a look that data you are providing contains proper utf-8 charset. I think your data probably contains back quotes.

ajkswamy commented 7 years ago

Sorry for not being clear, but in the example above, I create a completely new file 'test.h5' and create a new block, a new data array and a new tag. The error is reproducible by just running the above code.

achilleas-k commented 7 years ago

Hey Ajay.

I think I know what's going on here. This line

    replace(micro, "u").replace(mugr, "u")

is meant to replace the character μ in a unit string with u. It does two replacements since there are two different μ codepoints. One meant to be used as an SI prefix (micro) and the Greek lowercase m (mugr).

This should work on both Python 2 and 3. Which OS are you running (Ubuntu 14.04, if I remember correctly?). I'll see if I can pinpoint the exact issue.

ajkswamy commented 7 years ago

Hi Achilleas

It seems to be unicode conversion issue as dim.unit returns a unicode str which is not accepted by tag.units. It seems to accept normal str though

In [5]: dim.unit        
Out[5]: u's'

In [6]: tag.units = [da.dimensions[0].unit]
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-6-2943b086a69a> in <module>()
----> 1 tag.units = [da.dimensions[0].unit]

/home/aj/intel/intelpython27/lib/python2.7/site-packages/nixio/pycore/tag.pyc in units(self, units)
     49             for u in units:
     50                 util.check_attr_type(u, str)
---> 51                 u = util.units.sanitizer(u)
     52                 if not (util.units.is_si(u) or util.units.is_compound(u)):
     53                     raise InvalidUnit(

/home/aj/intel/intelpython27/lib/python2.7/site-packages/nixio/pycore/util/units.pyc in sanitizer(unit)
     64     mugr = "μ"
     65     return unit.replace(" ", "").replace("mu", "u").\
---> 66         replace(micro, "u").replace(mugr, "u")
     67 
     68 

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)

In [7]: tag.units =[str(da.dimensions[0].unit)]
achilleas-k commented 7 years ago

Cool. Thanks for the extra info. I'll try to poke this bug to death later today... or tomorrow. Or soon. Most likely soonish.

s0nskar commented 7 years ago

Hey @achilleas-k, i was trying to run tag = blk.create_tag('TestTag', 'Test', position=[10]) from above code but it's giving me this error.

ArgumentError                             Traceback (most recent call last)
<ipython-input-10-9cbdb1fe1850> in <module>()
----> 1 tag = blk.create_tag('TestTag', 'Test', position=[10])

ArgumentError: Python argument types in
    Block.create_tag(Block, str, str)
did not match C++ signature:
    create_tag(nix::Block {lvalue}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<double, std::allocator<double> >)
achilleas-k commented 7 years ago

Hello @s0nskar.

That's an issue that arises when using the C++ bindings (backend="hdf5") instead of the pure Python backend (backend="h5py"). The problem here is that some functions simply call the equivalent C++ function in the backend directly, while others have a Python layer that does some preprocessing of the function arguments. It should work if you don't use the position keyword argument.

On the one hand, this could count as an API incompatibility between the two backends, which might be an issue. On the other hand, it's just the way Python works. It has keyword arguments. I guess we could have a preprocessing for keyword arguments for ALL methods before calling the backend, but that would be some amount of work for little benefit.