pydot does not support unicode

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?

import pydot
n1 = u"Thérèse Doe"
n2 = u"Jean-Pierre Toué"

# Does not work
g = pydot.Dot()
g.add_edge(pydot.Edge(n1, n2))
g.write_jpeg('test.jpg')

# Works :)
g = pydot.Dot()
g.add_edge(pydot.Edge(n1.encode('UTF-8'), n2.encode('UTF-8')))
g.write_jpeg('test.jpg')

What version of the product are you using? On what operating system?

1.0.2, on Fedora 10.

Please provide any additional information below.

Originally filed here: https://bugzilla.redhat.com/show_bug.cgi?id=481786

Original issue reported on code.google.com by spo...@gmail.com on 3 Feb 2009 at 8:27

GoogleCodeExporter commented 9 years ago

Indeed useful (am doing my thesis diagrams with PyDot and the greek alphabet is
useful to represent).

Let me know if I can help; not a unicode expert at all...

Original comment by msbra...@gmail.com on 4 May 2009 at 10:13

GoogleCodeExporter commented 9 years ago

i didn't check the code, but i needed to add quotes explicitely to my unicode 
strings
in order to use them with pydot, like this: 

x = u'my unicöde var'
myNode.set_label('"'+x.encode('utf-8')+'"')

Original comment by rux...@gmail.com on 21 Jun 2009 at 4:58

GoogleCodeExporter commented 9 years ago

As I am not an expert in Python unicode issues, I asked someone who was to take 
a
look at this issue, and this was his (Toshio Kuratomi's) reply:

--- Comment #3 from Toshio Ernie Kuratomi <a.badger@gmail.com>  2009-05-29 
03:26:44
EDT ---
This one looks like it isn't a bug to me.  Rather, it's a request for an API
change.

Right now, pydot accepts str type.  It does not accept unicode type.  So the
user is forced to change the unicode strings that they have into byte strings
before sending it into a pydot function.  That's why n1.encode('UTF-8') is
necessary.

This makes some sense as pydot must interact with the world outside of python
in the form of the /usr/bin/dot command.  pydot communicates with that command
by writing the information for /usr/bin/dot to a temporary file and then having
/usr/bin/dot operate on that file.  In order to create the temporary file,
pydot must deal in byte strings (str).  In the current code, the user gives
pydot byte strings and pydot writes those out directly to the file.  The user
performs the conversion from unicode type to utf-8 encoded byte string.

In order for pydot to handle unicode strings instead of byte strings, it would
need to make the conversion that the user is currently doing.  That shouldn't
be too hard as /usr/bin/dot will accept utf-8 and all unicode strings can be
encoded to utf-8.  However, for sanity of the pydot upstream, pydot probably
should stop accepting byte strings when it makes this switch.  So end-user code
similar to this will start to fail:

g.add_edge(pydot.Edge('Th\xe9r\xe8se Doe'))

If pydot upstream chooses to accept both byte strings and unicode type, it will
have to take into account what happens when the user provides byte strings that
are not valid utf-8 and also unicode strings.  If they aren't careful, pydot
will get confused about what it needs to do in this situation and either crash
or output garbage.

Making this sort of API change should only be done by upstream.

Original comment by spo...@gmail.com on 6 Jul 2009 at 1:13

GoogleCodeExporter commented 9 years ago

Original comment by ero.carr...@gmail.com on 31 Oct 2010 at 12:15

Changed state: Fixed

cudadog / pydot

pydot does not support unicode #24