asciidoctor / asciidoctor-diagram

:left_right_arrow: Asciidoctor diagram extension, with support for AsciiToSVG, BlockDiag (BlockDiag, SeqDiag, ActDiag, NwDiag), Ditaa, Erd, GraphViz, Mermaid, Msc, PlantUML, Shaape, SvgBob, Syntrax, UMLet, Vega, Vega-Lite and WaveDrom.
http://asciidoctor.org
MIT License
431 stars 106 forks source link

ditaa diagram not render Chinese character correctly #73

Open pepijnve opened 9 years ago

pepijnve commented 9 years ago

Initially reported as https://github.com/pepijnve/asciidoctor-diagram-java/issues/1 by @0000-bigtree.

i use asciidoctor-diagram (1.3.0.preview.4) CRuby(2.2.2p95 (2015-04-13 revision 50295) [i386-mingw32]) with below ditaa code in a adoc(UTF-8)

ruby -Ku -S asciidoctor -r asciidoctor-diagram xxx.adoc

[ditaa, "header", "png"]

+---------+ | 4 bytes | CRC +---------+ | 1 byte | 协议版本(Chinese Character!) +---------+ | 2 bytes | 消息类型 ID(命令码)

+---------+

produced png did not render Chinese character correctly.

header

but i use

java -Dfile.encoding=UTF-8 -jar ditaa0_9.jar(from sourceforge)  xxx.ditaa

produced png,it rendered fine.

message_head_structure_4

pepijnve commented 9 years ago

@0000-bigtree I've been trying to reproduce this, but I'm getting mixed results. There's definitely an issue when loading diagrams from external files. Asciidoctor-diagram is using the default external encoding in this case and interpreting as UTF-8. For the embedded diagrams, I can't see anything wrong and I'm getting correct output even if I specify -E GBK:GBK on the command line.

pepijnve commented 9 years ago

@mojavelinux could you confirm that asciidoctor always interprets input files as UTF-8. AFAICT the encoding document attribute isn't supported, right?

mojavelinux commented 9 years ago

could you confirm that asciidoctor always interprets input files as UTF-8.

That's correct. AsciiDoc files must be in UTF-8 (or UTF-16). This is a firm requirement.

You can see where this happens in the code here:

https://github.com/asciidoctor/asciidoctor/blob/master/lib/asciidoctor/helpers.rb#L49-L110

pepijnve commented 9 years ago

Perfect! 755083a9b90d51afc23d9e041dd5f7116dd95099 uses exactly that code to do the same normalization on externally loaded diagram source code.

mojavelinux commented 9 years ago

:+1:

You may want to also consider launching the java command with the file encoding property set. Do you think that's necessary?

pepijnve commented 9 years ago

It shouldn't make a difference. The diagram code is passed to Java over HTTP as text/plain; charset=UTF8. Conversion of the post body from byte[] to String is done using an explicit charset, UTF8. In other words, it should already just work.

I'll have to set up a more elaborate reproduction scenario I think. Maybe make a vm configured to use a Chinese locale.

Op 11-mei-2015 om 23:12 heeft Dan Allen notifications@github.com het volgende geschreven:

You may want to also consider launching the java command with the file encoding property set. Do you think that's necessary?

— Reply to this email directly or view it on GitHub.

mojavelinux commented 9 years ago

Perfect. Thanks for the clarification!

kubamarchwicki commented 8 years ago

I'm not sure if that falls into the same problem, but for Polish locale I'm having similar problems

With following snippet

[ditaa]
....
+-----------------------+
| pliki źródłowe        |
+-----------------------+
....

I'm having following error (I'm using it through asciidoctor-gradle)

Caused by: org.jruby.exceptions.RaiseException: (RangeError) asciidoctor: FAILED: {mypath}/maven.adoc: Failed to parse source, too big for byte: 197
    at RUBY.load({mypath}/build/vendor/gems/asciidoctor-1.5.2/lib/asciidoctor.rb:1362)
    at RUBY.convert({mypath}/build/vendor/gems/asciidoctor-1.5.2/lib/asciidoctor.rb:1458)
    at RUBY.convert_file({mypath}/build/vendor/gems/asciidoctor-1.5.2/lib/asciidoctor.rb:1562)
    at RUBY.convertFile(<script>:68)

Running ditaa standalone doesn't show any issues aa

Sticking to ASCII chars only works around the asciidoctor-diagram problem

mojavelinux commented 8 years ago

Sticking to ASCII chars only works around the asciidoctor-diagram problem

We definitely don't want to accept that limitation :)

@kubamarchwicki Have you tried with Asciidoctor Diagram 1.3.0.preview.4 or are you using a release from the 1.2 series?

kubamarchwicki commented 8 years ago

I was using the release 1.2.0. With 1.3.0.preview.4 works like a charm! :100: thx

mojavelinux commented 8 years ago

:+1:

@pepijnve Is there anything I can do to help with the 1.3.0 release?

pepijnve commented 8 years ago

Great that it works for you @kubamarchwicki. That makes it extra strange that @0000-bigtree is having these problems with 1.3.0.preview.4 though. I would think once we're transferring everything in UTF-8 that we should be good regardless of the characters that are being used.

@mojavelinux I was about to release 1.3.0 but decided to tackle #75 before doing that (should have made that public earlier, sorry). It's not a terribly difficult thing to fix, just needs to be done and finding the time to do it has proven to be difficult lately.

mojavelinux commented 8 years ago

:+1: Thanks for the update, @pepijnve!

pepijnve commented 7 years ago

Pretty sure this was fixed by fixing #150.