code-google-com / pydot

Automatically exported from code.google.com/p/pydot
MIT License
0 stars 0 forks source link

Handling of encodings #52

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Parse a UTF-8 encoded dot file without BOM
2. Parse fails

Parse should work, as UTF-8 is the default encoding for dot files: "By default, 
DOT assumes the UTF-8 character encoding. It also accepts the Latin1 
(ISO-8859-1) character set, assuming the input graph uses the charset attribute 
to specify this. For graphs using other character sets, there are usually 
programs, such as iconv, which will translate from one character set to 
another." (http://www.graphviz.org/content/dot-language)

I also looked into the code of "dot_parser.py". Why is the content encoded to 
"ASCII" after decoding (line 512)?

What version of the product are you using? On what operating system?

Please provide any additional information below.

Original issue reported on code.google.com by pebo...@gmail.com on 4 Aug 2011 at 9:49

GoogleCodeExporter commented 9 years ago
There was no particular reason for the ASCII encoding being done in 512. 
Probably I've added it trying to chase down some problems with encodings. I 
will remove the line and hopefully will enable it work work in BOM-less UTF-8. 
It does work on my UTF-8 test files.

Original comment by ero.carr...@gmail.com on 13 Dec 2011 at 11:27