asciidoctor / asciidoctorj

:coffee: Java bindings for Asciidoctor. Asciidoctor on the JVM!
http://asciidoctor.org
Apache License 2.0
627 stars 172 forks source link

Unicode character references not handled correctly #924

Closed PartTimeDataScientist closed 4 years ago

PartTimeDataScientist commented 4 years ago

Working on a font-test document I do not get character references (decimal and hexadecimal) rendered correctly

[cols="80%,20%"]
|===
|Description |Example

|\u00a0 - no-break space |Foo bar

|\u2611 - ballot box checked (used for checked list item)
|Foo☑bar

|\u2610 - ballot box unchecked (used for unchecked list item)
|Foo☐bar

Converted using console (expected result) image

Converted using asciidoctorj image

(The different columns seem to be caused by another issue , this is really the same document converted!)

mojavelinux commented 4 years ago

Which version of AsciidoctorJ are you using?

AsciidoctorJ uses Asciidoctor, so the behavior should be the same.

PartTimeDataScientist commented 4 years ago

Console C:\Windows\System32>asciidoctor-pdf -V Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 Asciidoctor PDF 1.5.3 using Asciidoctor 2.0.10 [https://asciidoctor.org] Runtime Environment (jruby 9.2.9.0 (2.5.7) 2019-10-30 458ad3e Java HotSpot(TM) 64-Bit Server VM 25.101-b13 on 1.8.0_101-b13 +jit [mswin32-x86_64]) (lc:IBM437 fs:UTF-8 in:IBM437 ex:IBM437)

Java asciidoctorj-2.3.0.jar asciidoctorj-api-2.3.0.jar asciidoctorj-pdf-1.5.3.jar jcommander-1.78.jar jruby-complete-9.2.11.1.jar

JAR-Files are downloaded from https://mvnrepository.com/artifact/org.asciidoctor

mojavelinux commented 4 years ago

I tried with with AsciidoctorJ 2.3.0 (on Linux) and I cannot reproduce.

It's possible this is an encoding problem. This part worries me:

lc:IBM437 fs:UTF-8 in:IBM437 ex:IBM437

This is the typical problem of Window being configured to use a non-UTF-8 encoding. We take steps to make it work, but there are still scenarios where it causes Ruby to act in bizarre ways. UTF-8 is the universal standard and that's what you should be using for maximum compatibility.

I still don't understand, though, how you have AsciidoctorJ installed. AsciidoctorJ does not provide the asciidoctor-pdf script. I would expect your command to look something like this:

asciidoctorj -b pdf doc.adoc

It looks like you're actually using Asciidoctor PDF with JRuby, which is not AsciidoctorJ.

If you're going to use AsciidoctorJ, can you install it using Chocolately? (https://chocolatey.org/packages/asciidoctorj)

PartTimeDataScientist commented 4 years ago

I'll try if I can install using Chocolatey later and reproduce the Java behavior on the console...

It's possible this is an encoding problem. This part worries me:

lc:IBM437 fs:UTF-8 in:IBM437 ex:IBM437

Just for clarification: The conversion using the console installation is not the affected one. In addition I am afraid it is unlikely that it is a "simple" encoding problem as in the same document there are also some language tests which are rendered correctly: image

The JAVA application is the same Eclipse plugin that I am dealing with a bit longer. The relevant .jars are in the classpath and loaded as follows

Asciidoctor asciidoctor = create(Arrays.asList(
    //gems from asciidoctorj 2.2.0
    "uri:classloader:/gems/asciidoctor-2.0.10/lib",
    "uri:classloader:/gems/tilt-2.0.9/lib",
    "uri:classloader:/gems/thread_safe-0.3.6-java/lib",
    "uri:classloader:/gems/temple-0.8.2/lib",
    "uri:classloader:/gems/slim-4.0.1/lib",
    "uri:classloader:/gems/rouge-3.12.0/lib",
    "uri:classloader:/gems/open-uri-cached-0.0.5/lib",
    "uri:classloader:/gems/haml-5.0.4/lib",
    "uri:classloader:/gems/erubis-2.7.0/lib",
    "uri:classloader:/gems/concurrent-ruby-1.0.5-java/lib",
    "uri:classloader:/gems/coderay-1.1.2/lib",

    //gems from asciidoctorj-pdf
    "uri:classloader:/gems/asciidoctor-pdf-1.5.3/lib",
    "uri:classloader:/gems/ttfunk-1.5.1/lib",
    "uri:classloader:/gems/treetop-1.6.10/lib",
    "uri:classloader:/gems/thread_safe-0.3.6-java/lib",
    "uri:classloader:/gems/text-hyphen-1.4.1/lib",
    "uri:classloader:/gems/safe_yaml-1.0.5/lib",
    "uri:classloader:/gems/ruby-rc4-0.1.5/lib",
    "uri:classloader:/gems/rouge-3.16.0/lib",
    "uri:classloader:/gems/rghost-0.9.7/lib",
    "uri:classloader:/gems/public_suffix-1.4.6/lib",
    "uri:classloader:/gems/prawn-templates-0.1.2/lib",
    "uri:classloader:/gems/prawn-table-0.2.2/lib",
    "uri:classloader:/gems/prawn-svg-0.30.0/lib",
    "uri:classloader:/gems/prawn-icon-2.5.0",
    "uri:classloader:/gems/prawn-icon-2.5.0/lib",
    "uri:classloader:/gems/prawn-icon-2.5.0/data",
    "uri:classloader:/gems/prawn-icon-2.5.0/data/fonts",
    "uri:classloader:/gems/prawn-icon-2.5.0/data/fonts/fa4",
    "uri:classloader:/gems/prawn-icon-2.5.0/data/fonts/fab",
    "uri:classloader:/gems/prawn-icon-2.5.0/data/fonts/far",
    "uri:classloader:/gems/prawn-icon-2.5.0/data/fonts/fas",    
    "uri:classloader:/gems/prawn-icon-2.5.0/data/fonts/fi",
    "uri:classloader:/gems/prawn-icon-2.5.0/data/fonts/pf",                         
    "uri:classloader:/gems/prawn-2.2.2/lib",
    "uri:classloader:/gems/polyglot-0.3.5/lib",
    "uri:classloader:/gems/pdf-reader-2.4.0/lib",
    "uri:classloader:/gems/pdf-core-0.7.0/lib",
    "uri:classloader:/gems/hashery-2.1.2/lib",
    "uri:classloader:/gems/css_parser-1.7.1/lib",
    "uri:classloader:/gems/concurrent-ruby-1.1.6/lib",
    "uri:classloader:/gems/Ascii85-1.0.3/lib",
    "uri:classloader:/gems/afm-0.2.2/lib",
    "uri:classloader:/gems/addressable-2.4.0/lib",                          

    //gems from asciidoctorj-epub
    "uri:classloader:/gems/rubyzip-2.0.0/lib",
    "uri:classloader:/gems/nokogiri-1.10.9-java/lib",
    "uri:classloader:/gems/mime-types-3.3.1/lib",
    "uri:classloader:/gems/mime-types-data-3.2020.0425/lib",
    "uri:classloader:/gems/mini_portile2-2.4.0/lib",
    "uri:classloader:/gems/gepub-1.0.11/lib",
    "uri:classloader:/gems/asciidoctor-epub3-1.5.0.alpha.16/lib"));
mojavelinux commented 4 years ago

This looks more to be an issue with whatever Eclipse plugin you are using. Unless you can reproduce this issue in a standalone Java application that uses the AsciidoctorJ API, I'm afraid we cannot say this has anything to do with AsciidoctorJ.

mojavelinux commented 4 years ago

In other words, you need to provide a reproducible environment where we can test this scenario or else we cannot help you solve this problem.

mojavelinux commented 4 years ago

in the same document there are also some language tests which are rendered correctly:

That's good to know. So it's not likely an encoding issue.

What we're likely dealing with here is some sort of classpath malfunction. What I'm confident of is that Asciidoctor is not working normally in your environment, because these are not problems that Asciidoctor has under normal circumstances.

robertpanzer commented 4 years ago

Can you please provide the code how you convert the document? It's strange that the 2 columns don't even make it into the rendered pdf. Do you pass a file, or a stream? Without a reproducer it's close to impossible to follow what could be going on.

PartTimeDataScientist commented 4 years ago

I working on a Asciidoctor plugin for the Eclipse based application KNIME to enable simple report generation. The code used for creating the Asciidoctor interface is given above. The rest of the conversion is performed using the following code blocks:

    attrs = attributes()
        .attribute("pdf-stylesdir", "d:\\asciidoctor-pdf themes\\themes\\")
        .attribute("pdf-style", "FlatUI-print-theme.yml")
        .icons(org.asciidoctor.Attributes.IMAGE_ICONS)
        .iconFontRemote(true);

    opts = options().safe(SafeMode.UNSAFE).backend("pdf")
    .headerFooter(true)
    .toDir(outputDir)
    .toFile(outputFile)
    .attributes(attrs);

    Asciidoctor asciidoctor = create(Arrays.asList(...see above...));
    asciidoctor.convert(inputCell.toString(), opts);
    asciidoctor.shutdown();

This brought me to the point to log the results of inputCell.toString() to the console and it seems like the unicode characters as well as the quotation marks in the [cols="8,2"] are being eaten by the .toString() conversion.

[cols=8,2]
|===
|Description |Example

|\u00a0 - no-break space 
|Foo&

|\ufeff - zero width no-break space
|Foo&

So I would also close this one as well until I am sure that the correct string is transferred to Asciidoctor. Thanks for "pointing" me there and sorry for the inconvenience. I had already done a series of tests but overlooked that conversion as possible source for the problems. :confounded: