UTF-8 support? - Githubissues

minibiti commented 8 years ago

Hi, I have German and Norwegian characters in my document which do not render properly when I use the extension. Is there some custom attribute I could use to turn UTF-8 support on, or it is just not supported at the moment?

Thanks! JM.

mojavelinux commented 8 years ago

If you look in your browser console, you will likely see the following:

The character encoding of the plain text document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the file needs to be declared in the transfer protocol or file needs to use a byte order mark as an encoding signature.

I have yet to figure out why Firefox would use such a dumb default, and how to work around it. It is Firefox that is giving Asciidoctor garbled text, and admitting it. Why on earth do they not serve UTF-8 by default?

The plugin retrieves the raw text using:

document.firstChild.textContent

If you run this in the console, you can see the garbage that Firefox is giving us.

The plugin used to fallback to XmlHttpRequest to retrieve the text in UTF-8, automatically detecting the encoding error. However, that seems to have been removed in recent versions (perhaps because Mozilla rejected it).

https://github.com/asciidoctor/asciidoctor-firefox-addon/blob/master/data/asciidocify.js#L53

Here's what it used to do:

// if charset is not UTF-8, try techniques to coerce it to UTF-8
// likely used only for local files
if (document.characterSet.toUpperCase() != 'UTF-8') {
  try {
    // this technique works if all characters are in standard ASCII set
    // see: http://www.ascii-code.com
    sanitizeAndShowHTML(convertToHTML(decodeURIComponent(escape(document.firstChild.textContent))));
  } catch (decodeError) {
    // XMLHttpRequest responseText is UTF-8 encoded by default
    var xhr = new XMLHttpRequest();
    xhr.open('GET', window.location.href, true);
    xhr.onload = function (evt) {
      if (xhr.readyState === 4) {
        // NOTE status is 0 for local files (i.e., file:// URIs)
        if (xhr.status === 200 || xhr.status === 0) {
          sanitizeAndShowHTML(convertToHTML(xhr.responseText));
        } else {
          console.error('Could not read AsciiDoc source. Reason: [' + xhr.status + '] ' + xhr.statusText);
        }
      }
    };
    xhr.onerror = function (evt) {
      console.error(xhr.statusText);
    };
    xhr.send();
  }
} else {
  sanitizeAndShowHTML(convertToHTML(document.firstChild.textContent));
}

I don't know any other way to force Firefox to give us UTF-8 encoded text.

mojavelinux commented 8 years ago

...and what we used to do worked.

minibiti commented 8 years ago

Thanks for you update Dan! I just realized that I have the problem on Firefox only indeed. It is working ok with Chrome. Interesting... :)

mojavelinux commented 8 years ago

@Mogztter Is it true that Mozilla won't let us use XmlHttpRequest to fetch the source text? If not, can they offer a way to get UTF-8 encoded text from the document already loaded?

ggrossetie commented 8 years ago

However, that seems to have been removed in recent versions (perhaps because Mozilla rejected it).

Correct.

Is it true that Mozilla won't let us use XmlHttpRequest to fetch the source text?

I think they don't want XmlHttpRequest because this is a synchronous method but I can try to explain why we need it (and maybe they will give me a workaround to get UTF-8 encoded text)

@Mogztter https://github.com/Mogztter Is it true that Mozilla won't let us use XmlHttpRequest to fetch the source text? If not, can they offer a way to get UTF-8 encoded text from the document already loaded?

— Reply to this email directly or view it on GitHub https://github.com/asciidoctor/asciidoctor-firefox-addon/issues/43#issuecomment-174918383 .

mojavelinux commented 8 years ago

The strategy I recommend when talking to them is to focus on the primary objective, UTF-8. That way, the conversation doesn't get derailed by a debate about XmlHttpRequest. In other words, the focus should be UTF-8, not XmlHttpRequest.

Tell them that we need to obtain the plain text in UTF-8 (regardless of the browser's default encoding) and that we are open to using any API that will get us that result.

You can also emphasize that without the UTF-8 text, we cannot properly support non-English languages. They should be sensitive to that need.

ggrossetie commented 8 years ago

https://discourse.mozilla-community.org/t/get-content-plain-text-in-utf-8/6824

mojavelinux commented 8 years ago

Well said.

ggrossetie commented 8 years ago

Really tired of wasting my time... Firefox SDK is pure nonsense ! The documentation is getting better but there's so many way to write extensions XUL, WebExtensions, SDK (High level API which is not compatible with Low level API)...

Anyway, I put back the fallback to XmlHttpRequest to retrieve the text in UTF-8 :tada:

And please vote for this issue https://bugzilla.mozilla.org/show_bug.cgi?id=1071816 :smile:

mojavelinux commented 8 years ago

\o/

mojavelinux commented 8 years ago

Voted.

getreu commented 8 years ago

A user of my little asciidoctor-notetaking script, had a similar issue with window 7: when the script open a local note-text-file in firefox, firefox assumes the text encoding to be the default windows locale, in his case something like "Windows-1252". Since the note-text-files are generated with a template in UTF-8, special chars are not shown correctly. Does your patch also solve this issue?

ggrossetie commented 8 years ago

Not sure, is asciidoctor-notetaking built on top of the Asciidoctor Firefox Addon ?

If yes I assume this will fix the issue.

If not you will have to wait for Mozilla to resolve this issue in core or to create an Add-on to fix it yourself. If you need help, feel free to ask ;)

ghost commented 8 years ago

A simple temporary workaround is to write the unicode bom in the source (geany editor has a command for this, for example)

ggrossetie commented 8 years ago

Yes but this is now fixed in the 0.5.0 release: https://github.com/asciidoctor/asciidoctor-firefox-addon/releases/tag/v0.5.0

ghost commented 8 years ago

@Mogztter thank’you, I didn’t know of the new release. Will you go back to signing? From version 46 signature overriding won’t be possible anymore

ggrossetie commented 8 years ago

From version 46 signature overriding won’t be possible anymore Yes I know :disappointed:

Currently the signing API is broken: https://bugzilla.mozilla.org/show_bug.cgi?id=1244644 but I will try to get 0.5.0 signed !

asciidoctor / asciidoctor-firefox-addon

UTF-8 support? #43