apjanke / ronn-ng

Ronn-NG: An updated fork of ronn. Build man pages from Markdown.
MIT License
80 stars 15 forks source link

Invalid byte sequence in US-ASCII #39

Closed lorenzo93 closed 4 years ago

lorenzo93 commented 4 years ago

Hi,

I just installed the last version of ronn-ng (0.9.0), I'm using Ruby 2.7.0p0 (2019-12-25 revision 647ee6f091). I'm trying to build to HTML a ronn document and I've got this error.

Traceback (most recent call last):
    16: from /usr/local/bundle/bin/ronn:23:in `<main>'
    15: from /usr/local/bundle/bin/ronn:23:in `load'
    14: from /usr/local/bundle/gems/ronn-ng-0.9.0/bin/ronn:186:in `<top (required)>'
    13: from /usr/local/bundle/gems/ronn-ng-0.9.0/bin/ronn:186:in `each'
    12: from /usr/local/bundle/gems/ronn-ng-0.9.0/bin/ronn:201:in `block in <top (required)>'
    11: from /usr/local/bundle/gems/ronn-ng-0.9.0/bin/ronn:201:in `each'
    10: from /usr/local/bundle/gems/ronn-ng-0.9.0/bin/ronn:222:in `block (2 levels) in <top (required)>'
     9: from /usr/local/bundle/gems/ronn-ng-0.9.0/lib/ronn/document.rb:238:in `convert'
     8: from /usr/local/bundle/gems/ronn-ng-0.9.0/lib/ronn/document.rb:264:in `to_html'
     7: from /usr/local/bundle/gems/ronn-ng-0.9.0/lib/ronn/document.rb:272:in `to_html_fragment'
     6: from /usr/local/bundle/gems/ronn-ng-0.9.0/lib/ronn/document.rb:232:in `html'
     5: from /usr/local/bundle/gems/ronn-ng-0.9.0/lib/ronn/document.rb:325:in `process_html!'
     4: from /usr/local/bundle/gems/ronn-ng-0.9.0/lib/ronn/document.rb:310:in `input_html'
     3: from /usr/local/bundle/gems/ronn-ng-0.9.0/lib/ronn/document.rb:227:in `markdown'
     2: from /usr/local/bundle/gems/ronn-ng-0.9.0/lib/ronn/document.rb:319:in `process_markdown!'
     1: from /usr/local/bundle/gems/ronn-ng-0.9.0/lib/ronn/document.rb:353:in `markdown_filter_heading_anchors'
/usr/local/bundle/gems/ronn-ng-0.9.0/lib/ronn/document.rb:353:in `split': invalid byte sequence in US-ASCII (ArgumentError)

Thanks

apjanke commented 4 years ago

This sounds like an encoding problem: that invalid byte sequence in US-ASCII is what will happen if you have a document in UTF-8 or some other non-basic-ASCII encoding, and it has non-basic-ASCII characters.

I'm a little suprised this is happening; I thought we were using UTF-8 by default.

Could you post a download link to the .ronn file you're trying to build the man page from? And what are the results of locale for you? For example, I get this:

locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=
lorenzo93 commented 4 years ago

Hi, I know I'm using a non ASCII character.

This is my locale, I'm trying to build it inside a docker container with the ruby:latest image.

root@0e45c1c2d7b9:/Kathara/docs# locale
LANG=
LANGUAGE=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=

As for the file you are asking, I'm building the docs for Kathara. You can just clone the repo and run make all inside the docs folder. I'm giving you the repo and not just the file because I don't know if the problem is caused because I'm passing the file from m4 (a macro processor) to add the footer notes to all the files. This way you can see the whole pipeline from my source to the output

Thanks

apjanke commented 4 years ago

Thanks!

What's happening is that Ronn-NG is using your locale to determine your character set, and your locale doesn't support non-ASCII characters: the POSIX locale technically only supports the 7-bit ASCII character set, and (unlike many languages) Ruby follows that strictly. Looks like Kathara has some UTF-8 in their docs, in the footer.txt.

$ file *
Makefile:                a /usr/bin/make -f script text executable, ASCII text
footer.txt:              UTF-8 Unicode text
index.txt:               ASCII text
kathara-check.1.ronn:    ASCII text
kathara-connect.1.ronn:  ASCII text
kathara-lab-dirs.7.ronn: ASCII text, with very long lines
kathara-lab.conf.5.ronn: ASCII text, wit

But I think that's an issue with Ronn, and not a configuration problem on your machine. Ronn is for generating doco in distributions, just like you're doing with Kathara. And those distributions have no way of knowing or conforming to the locales that user machines are set up on. They want to generate their doco using the encoding that their files are in, regardless of the target machine/user's locale. And the user shouldn't have to change their locale settings to get that to work.

So I think what should happen is that Ronn should default to using a UTF-8 encoding, regardless of the locale setting where it's being run. (And it should probably supply a command line switch or Ronn-specific environment variable to override that choice and use a different encoding.)

I'll get this set up. Give me a little time...

apjanke commented 4 years ago

I have changed ronn to ignore the user's locale and default to UTF-8 encoding, and added a -E/--encoding option to override that.

Building the Kathara docs now works for me in the POSIX locale.

$ pwd
/Users/janke/local/repos/Kathara/docs
$ locale
LANG="en_US.UTF-8"
LC_COLLATE="POSIX"
LC_CTYPE="POSIX"
LC_MESSAGES="POSIX"
LC_MONETARY="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_ALL="POSIX"
$ make all
cat kathara-check.1.ronn | m4 -P | ronn --manual= -o Roff/ --roff > Roff/kathara-check.1
cat kathara-connect.1.ronn | m4 -P | ronn --manual= -o Roff/ --roff > Roff/kathara-connect.1
cat kathara-lab-dirs.7.ronn | m4 -P | ronn --manual= -o Roff/ --roff > Roff/kathara-lab-dirs.7
cat kathara-lab.conf.5.ronn | m4 -P | ronn --manual= -o Roff/ --roff > Roff/kathara-lab.conf.5
cat kathara-lab.dep.5.ronn | m4 -P | ronn --manual= -o Roff/ --roff > Roff/kathara-lab.dep.5
cat kathara-lab.ext.5.ronn | m4 -P | ronn --manual= -o Roff/ --roff > Roff/kathara-lab.ext.5

Could you download the latest master snapshot and try it out yourself? If it works, I'll cut a new release for this.

lorenzo93 commented 4 years ago

Hi @apjanke,

sorry for replying so late but I was busy at work.

I've tried with the latest master commit of ronn, with the ruby:latest docker image but it seems the error changed. Looks like someone is calling ruby from a script with some errors.

root@ba4882e45e77:/Kathara/docs# make html-build
mkdir Html/
cat kathara-check.1.ronn | m4 -P | ronn --manual="Kathara manual" -o Html/ --html > Html/kathara-check.1.html
/usr/bin/env: 'ruby -E UTF-8': No such file or directory
/usr/bin/env: use -[v]S to pass options in shebang lines
make: *** [Makefile:26: Html/kathara-check.1.html] Error 127
apjanke commented 4 years ago

No worries; we've all got other things to do.

Well, crap. Looks like the multi-argument shebang line I put in is not compatible with Linux. I think I remember reading something about that.

Okay, I'll pull that out and do this some other way.

apjanke commented 4 years ago

Okay, I tried it a different way: https://github.com/apjanke/ronn-ng/commit/26813b26bce7c2618158cdc8a29e14b09b254406. Works for me on Mac.

Can you give it another try from the latest on master?

lorenzo93 commented 4 years ago

Thank you very much. I really appreciate your work.

Now it works like a charm.

apjanke commented 4 years ago

Excellent.

I need to get the unit tests cleaned up and working again, and then I'll roll this out as a new release.

apjanke commented 4 years ago

Sorry this took so long to get a new release going out!

I've added this to the pipeline for 0.10.0, which I'm in the process of releasing in the next few days. Closing as fixed.