espeed / bulbs

A Python persistence framework for graph databases like Neo4j, OrientDB and Titan.
http://bulbflow.org
Other
622 stars 83 forks source link

`unicode_escape_decode` function issue. #114

Closed pbu88 closed 10 years ago

pbu88 commented 10 years ago

I've come here through this question in StackOverflow: http://stackoverflow.com/questions/19824952/unicodeencodeerror-bulbs-and-neo4j-create-model

I noticed the u function in your code. The create function is somehow calling the u function inside utils which is just one line of code for Python lower than 3 I guess:

return codecs.unicode_escape_decode(x)[0]

I'm somewhat suspicious of that function and it behavior because actually I can't even find it's implementation on Python 2.7's code, nor inside the codecs docs. I can't even tell what it does and never saw it used but maybe it's quite helpful. The problem es that the exception is that, when I use it from a normal Python interpreter with a proper unicode string:

>>> codecs.unicode_escape_decode(u'\u00f6')
Traceback (most recent call last):

File "", line 1, in UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 0: ordinal not in range(128)

An exception occurs. I don't know what are you trying to accomplish here, and maybe I'm just talking without knowledge, but as the name suggests, _encode is what is needed, to store the string escaped safely, if otherwise I do:

>>> codecs.unicode_escape_encode(u'\u00f6')
('\\xf6', 1)

It works ok. But as I say, I don't know what these functions do because I haven't found them in any place.

Also, take a look at line 131 in utils.py:

segments = [quote(u(str(segment)), safe='') for segment in args if segment is not None]

It seems it is converting to str every input it gets, regardless if it's Python3 or lower. If a Python2.7 unicode appears with non ascii chars, It'll also complaint with the same exception.

In any case I just want to help. If I just talk nonsense here just ignore me please and sorry for wasting your time :)

Hope this be useful.

topiaruss commented 10 years ago

I suspect you are running your experiment in an ASCII terminal. When you try the escape decode interactively, it's successful, but it then generates an exception as it outputs to the ASCII terminal. Google how to get your terminal in Unicode mode, then repeat.

I'll have to leave it there. Post the outcome and I may be able to add more later.

Good luck.

Russ Ferriday M: +44 7429 518822 Skype: ferriday

On 7 Nov 2013, at 13:48, Paulo Bu notifications@github.com wrote:

I've come here through this question in StackOverflow: http://stackoverflow.com/questions/19824952/unicodeencodeerror-bulbs-and-neo4j-create-model

I noticed the u function in your code. The create function is somehow calling the u function inside [utils] which is just one line of code for Python lower than 3 I guess:

return codecs.unicode_escape_decode(x)[0] I'm somewhat suspicious of that function and it behavior because actually I can't even find it's implementation on Python 2.7's code, nor inside the codecs docs. I can't even tell what it does and never saw it used but maybe it's quite helpful. The problem es that the exception is that, when I use it from a normal Python interpreter with a proper unicode string:

codecs.unicode_escape_decode(u'\u00f6') Traceback (most recent call last): File "", line 1, in UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 0: ordinal not in range(128)

An exception occurs. I don't know what are you trying to accomplish here, and maybe I'm just talking without knowledge, but as the name suggests, _encode is what is needed, to store the string escaped safely, if otherwise I do:

codecs.unicode_escape_encode(u'\u00f6') ('\xf6', 1)

It works ok. But as I say, I don't know what these functions do because I haven't found them in any place.

Also, take a look at line 131 in utils.py:

segments = [quote(u(str(segment)), safe='') for segment in args if segment is not None] It seems it is converting to str every input it gets, regardless if it's Python3 or lower. If a Python2.7 unicode appears with non ascii chars, It'll also complaint with the same exception.

In any case I just want to help. If I just talk nonsense here just ignore me please and sorry for wasting your time :)

Hope this be useful.

— Reply to this email directly or view it on GitHub.

pbu88 commented 10 years ago

From debian's terminal:

$echo $LANG
en_US.UTF-8
$python
...
import codecs
codecs.unicode_escape_decode(u'\xf6')
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 0: ordinal not in range(128)

The same, something inside that function is trying to convert a unicode stream into ascii. From my point of view, I'll like to emphasize of the name of that function: decode. If somehow you're trying to escape for json, html or whatever a unicode string, wouldn't be escape_encode instead? I mean, that works for me.

Take a look at this session:

In [1]: import codecs
In [2]: ascii_safe = codecs.unicode_escape_encode(u'\xf6')
In [3]: ascii_safe
Out[3]: ('\\xf6', 1)

In [4]: u = codecs.unicode_escape_decode(ascii_safe[0])
In [5]: u
Out[5]: (u'\xf6', 4)

First function take unicode and returns it ascii-escaped. Second function takes ascii_escaped and returns unicode.

Thanks for your quick response.

espeed commented 10 years ago

Fixed in Bulbs 0.3.23 https://github.com/espeed/bulbs/commit/7f104cdbc30f27ea76b036cfa0d0a694f074153e

pbu88 commented 10 years ago

I'm glad I could help. Congratulations on your project, very interesting. If you need some help just let me know. I'll be glad to collaborate. I didn't fork first because I wasn't sure about the issue cause I'm not familiar with the project's code, but I'll be glad to collaborate if something is needed.

espeed commented 10 years ago

Hi Paulo -

Help is always welcome :)

Are you with a particular graph database right now?

On Mon, Nov 11, 2013 at 8:56 AM, Paulo Bu notifications@github.com wrote:

I'm glad I could help. Congratulations on your project, very interesting. If you need some help just let me know. I'll be glad to collaborate. I didn't fork and first because I wasn't sure about the issue cause I'm not familiar with the project's code, but I'll me glad to collaborate if something is needed.

— Reply to this email directly or view it on GitHubhttps://github.com/espeed/bulbs/issues/114#issuecomment-28205946 .