asciidoctor / asciidoclet

:clipboard: A Javadoc Doclet based on Asciidoctor that lets you write Javadoc in the AsciiDoc syntax.
https://github.com/asciidoctor/asciidoclet
Apache License 2.0
133 stars 40 forks source link

Can Asciidoclet works with accentuated caracters + asciidoc-diagram documentation #41

Closed JGrenier closed 9 years ago

JGrenier commented 9 years ago

hi, Is there a way to have accentuated caracters rendered correctly ?

see this project sample i made to reproduce the problem.

é is rendered as é

https://github.com/JGrenier/asciidoclet-sample.git

I succeeded in using asciidoctor diagram working in a pure java project but it requires some prerequisites to get it working. May be these explainations should be added to the asciidoclet documentation.

See my simple sample for a working example. May be i haven't configured my project in the right way, so do not hesitate to tell me if i am wrong.

The missing information is :

dependencies { //add the jruby plugin to install the gem classpath 'com.github.jruby-gradle:jruby-gradle-plugin:0.1.11' } apply plugin: 'com.github.jruby-gradle.base'

dependencies { asciidoclet 'org.asciidoctor:asciidoclet:1.+' gems 'rubygems:asciidoctor-diagram:1.+' }

javadoc { dependsOn jrubyPrepareGems options.docletpath = configurations.asciidoclet.files.asType(List) options.doclet = 'org.asciidoctor.Asciidoclet' options.overview = "src/main/java/overview.adoc" options.addStringOption "-base-dir", "${projectDir}" options.addStringOption "-attributes-file", "${projectDir}/config/javadoc-attributes.adoc" options.addStringOption "-require", "asciidoctor-diagram" options.addStringOption "-gem-path", "${jrubyPrepareGems.outputDir}" options.addStringOption "-attribute", "data-uri," + "name=${project.name}," + "version=${project.version}," + "title-link=http://example.com[${project.name} ${project.version}]" }

Thanks in advance.

benevans commented 9 years ago

You might need to add

options.addStringOption "encoding", "UTF-8"

to your javadoc options. By default Javadoc just uses the platform's default encoding.

JGrenier commented 9 years ago

Many thanks for your answer.

I tried options.addStringOption "encoding", "UTF-8" as suggested but it still doesn't work.

I already tried this before posting the issue

       options.addStringOption "-attribute", "data-uri," +
                        "encoding=UTF-8," + ...

But always the same result :(

benevans commented 9 years ago

How about this:

options.addStringOption "charset", "UTF-8"

I reproduced your problem by setting "charset" to "ISO-8859-1", which I assume is your system's default. Setting it to "UTF-8" makes the "é" render correctly.

Javadoc puts this charset in the output HTML's META tag, so the browser renders the page using the specified charset. This is an option for javadoc's standard doclet, and you would need this with or without Asciidoclet. See http://docs.oracle.com/javase/6/docs/technotes/tools/solaris/javadoc.html#charset.

benevans commented 9 years ago

Actually now that I think about it, the problem might have been your browser's default charset. If you don't specify "charset" to javadoc then the meta tag is not inserted, and the browser's default charset will be used, which if it's ISO-8859-1 will render "é" as "é". Setting "charset" to "UTF-8" should make the browser do the right thing.

JGrenier commented 9 years ago

Hi, It works :)

options.addStringOption "charset", "UTF-8" do the job

Many thanks for your explanations.

JGrenier commented 9 years ago

Just to complete
system --> Windows :( IDE --> Eclipse which is cp1252 encoding by default but i configured the workspace to UTF-8

If i open the files with notepad++ they are not UTF-8 but ANSI as UTF-8. If I convert it to UTF-8 with notepad++, then the build fails ...

Now, with the charset option, no problem...

johncarl81 commented 9 years ago

@benevans, should we add a default charset (UTF-8) if none is specified?

benevans commented 9 years ago

Yeah I was wondering that too. I think it makes sense. We'd also want to set javadoc's -docencoding option too, that is the encoding that the generated HTML files will actually be written with. The -charset tells the browser what to expect, but isn't automatically the same as -docencoding, which seems crazy, it is just asking for errors like this one.

So I think a sensible thing to do is have -charset and -docencoding both set to "UTF-8" by default. Users can still set either option explicitly if they need to override for some reason.

johncarl81 commented 9 years ago

Looking at this issue a bit, it seems we don't have control over Javadoc's charset from the Doclet. It seems that the charset needs to be set manually as described above.