baqian / canviz

Automatically exported from code.google.com/p/canviz
0 stars 0 forks source link

Handle non-ASCII data properly #17

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
There's a mistake in the handling of non-ASCII strings in the tokenizer. The 
xdot format tells us 
how many bytes long a string will be, but I hand that count to the substr 
function, which counts in 
characters, not bytes. I also wrongly named our variable "chars" instead of 
"bytes". (Actually the 
mistake was in the Graphviz documentation which said the xdot format counted 
characters; I 
submitted a patch to fix the documentation.)

None of the sample graphs exhibit the problem. You only see the problem if you 
have a single label 
which results in more than one text draw command, such as a multiline label, or 
a record or HTML-
like table. Here's an example:

digraph utf8 {
    a [label="ää\nb"]
}

Result in Canviz:

unknown token 14.000000

This was originally reported to me by email by Jan Wielemaker in November 2007 
and he provided a 
patch in his repository:

http://gollem.science.uva.nl/git/ClioPatria.git?
a=commitdiff;h=1669b252b25b6e75ced28be39b0449e9d13a62d3

I can't find any JavaScript string functions that work on bytes instead of 
characters so the method 
proposed in this patch seems to be the way to go.

Original issue reported on code.google.com by ryandesi...@gmail.com on 13 Oct 2008 at 4:31

GoogleCodeExporter commented 8 years ago
Fixed in r115. I rewrote the patch to match my style and use 
easier-to-understand variable names.

Original comment by ryandesi...@gmail.com on 13 Oct 2008 at 6:43