anoma / juvix

A language for intent-centric and declarative decentralised applications
https://docs.juvix.org
GNU General Public License v3.0
457 stars 53 forks source link

`juvix init` Panics if System Language Uses Cyrillic Alphabet #2430

Closed agureev closed 5 months ago

agureev commented 1 year ago

juvix init command panics producing output

juvix: <stdout>: commitAndReleaseBuffer: invalid argument (cannot encode character '\10024')

if the system language uses Cyrillics instead of English seemingly unable to process the glitter/stars unicode.

Problem got solved by changing system language to English producing the expected output. It is probably worth mentioning this issue somewhere, e.g. after the doctor command.

The system info is:

paulcadman commented 1 year ago

We should use setLocaleEncoding utf8 before any console output.

https://hackage.haskell.org/package/base-4.18.1.0/docs/GHC-IO-Encoding.html#v:setLocaleEncoding

paulcadman commented 1 year ago

Serokell made a Haskell library to try to set a utf8 compatible locale for Haskell applications: https://serokell.io/blog/haskell-with-utf8

I cannot reproduce the error in the bug so I can't check to see if it's fixed.

@agureev if possible, could you compile juvix using the set-output-locale-with-utf8 branch and see if it fixes the issue (without changing the system language)?

agureev commented 1 year ago

EDIT: The problem as reproduced in this particular post was due to an internal mistake in my OS

Currently the error is still reproducible on commit 63e795f5 and it still seems to do with language settings. Here is the locale setup as well as the error message (which, as you may note, is the same as before, seemingly having to do with processing the glitter unicode):

LANG=ru_RU.UTF-8
LC_CTYPE="ru_RU.UTF-8"
LC_NUMERIC=en_CA.UTF-8
LC_TIME=en_CA.UTF-8
LC_COLLATE="ru_RU.UTF-8"
LC_MONETARY=en_CA.UTF-8
LC_MESSAGES="ru_RU.UTF-8"
LC_PAPER=en_CA.UTF-8
LC_NAME=en_CA.UTF-8
LC_ADDRESS=en_CA.UTF-8
LC_TELEPHONE=en_CA.UTF-8
LC_MEASUREMENT=en_CA.UTF-8
LC_IDENTIFICATION=en_CA.UTF-8
LC_ALL=

Running juvix init produces

juvix: <stdout>: commitAndReleaseBuffer: invalid argument (cannot encode character '\10024')

The problem disappears once the language is changed to English.

agureev commented 1 year ago

EDIT: The problem as reproduced in this particular post was due to an internal mistake in my OS

Here's a direct test using export:

$ juvix init
✨ Your next Juvix adventure is about to begin! ✨
I will help you set it up
Write the name of your project [leave empty for 'juvix'] (lower case letters, numbers and dashes are allowed): 
^C
$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
$ export LANG=ru_RU.UTF-8
$ locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=ru_RU.UTF-8
LC_CTYPE="ru_RU.UTF-8"
LC_NUMERIC="ru_RU.UTF-8"
LC_TIME="ru_RU.UTF-8"
LC_COLLATE="ru_RU.UTF-8"
LC_MONETARY="ru_RU.UTF-8"
LC_MESSAGES="ru_RU.UTF-8"
LC_PAPER="ru_RU.UTF-8"
LC_NAME="ru_RU.UTF-8"
LC_ADDRESS="ru_RU.UTF-8"
LC_TELEPHONE="ru_RU.UTF-8"
LC_MEASUREMENT="ru_RU.UTF-8"
LC_IDENTIFICATION="ru_RU.UTF-8"
LC_ALL=
$ juvix init
juvix: <stdout>: commitAndReleaseBuffer: invalid argument (cannot encode character '\10024')

Included the locale errors in case they may come in handy in some way

agureev commented 1 year ago

Similarly when you switch to KOI8-R for the en_US setting:

$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
$ export LANG=en_US.KOI8-R
$ locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=en_US.KOI8-R
LC_CTYPE="en_US.KOI8-R"
LC_NUMERIC="en_US.KOI8-R"
LC_TIME="en_US.KOI8-R"
LC_COLLATE="en_US.KOI8-R"
LC_MONETARY="en_US.KOI8-R"
LC_MESSAGES="en_US.KOI8-R"
LC_PAPER="en_US.KOI8-R"
LC_NAME="en_US.KOI8-R"
LC_ADDRESS="en_US.KOI8-R"
LC_TELEPHONE="en_US.KOI8-R"
LC_MEASUREMENT="en_US.KOI8-R"
LC_IDENTIFICATION="en_US.KOI8-R"
LC_ALL=
$ juvix init
juvix: <stdout>: commitAndReleaseBuffer: invalid argument (cannot encode character '\10024')
$ export LANG=en_US.UTF-8
$ juvix init
✨ Your next Juvix adventure is about to begin! ✨
I will help you set it up
Write the name of your project [leave empty for 'juvix'] (lower case letters, numbers and dashes are allowed): 
^C
$ 
paulcadman commented 1 year ago

Thank you for the investigation @agureev ✨

It's unfortunate that that https://github.com/anoma/juvix/pull/2463 doesn't solve this issue. I have no ideas at the moment. Perhaps adding something to the doctor is the best we can do right now.

agureev commented 1 year ago

@paulcadman I have to apologize about the previous error. It seems that one of the cases stemmed from not having a proper locale generation file installed. Here is a list of all commands done in Juvix with a locale report listing all installed generation files.

$ locale -a
C
C.utf8
en_CA.utf8
en_US
en_US.iso88591
en_US.utf8
POSIX
ru_RU.koi8r
ru_RU.utf8
$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
$ juvix init
✨ Your next Juvix adventure is about to begin! ✨
I will help you set it up
Write the name of your project [leave empty for 'juvix'] (lower case letters, numbers and dashes are allowed): 
^C
$ export LANG=ru_RU.UTF-8
$ locale
LANG=ru_RU.UTF-8
LC_CTYPE="ru_RU.UTF-8"
LC_NUMERIC="ru_RU.UTF-8"
LC_TIME="ru_RU.UTF-8"
LC_COLLATE="ru_RU.UTF-8"
LC_MONETARY="ru_RU.UTF-8"
LC_MESSAGES="ru_RU.UTF-8"
LC_PAPER="ru_RU.UTF-8"
LC_NAME="ru_RU.UTF-8"
LC_ADDRESS="ru_RU.UTF-8"
LC_TELEPHONE="ru_RU.UTF-8"
LC_MEASUREMENT="ru_RU.UTF-8"
LC_IDENTIFICATION="ru_RU.UTF-8"
LC_ALL=
$ juvix init
✨ Your next Juvix adventure is about to begin! ✨
I will help you set it up
Write the name of your project [leave empty for 'juvix'] (lower case letters, numbers and dashes are allowed): 
^C
$ export LANG=en_US.ISO-8859-1
$ locale
LANG=en_US.ISO-8859-1
LC_CTYPE="en_US.ISO-8859-1"
LC_NUMERIC="en_US.ISO-8859-1"
LC_TIME="en_US.ISO-8859-1"
LC_COLLATE="en_US.ISO-8859-1"
LC_MONETARY="en_US.ISO-8859-1"
LC_MESSAGES="en_US.ISO-8859-1"
LC_PAPER="en_US.ISO-8859-1"
LC_NAME="en_US.ISO-8859-1"
LC_ADDRESS="en_US.ISO-8859-1"
LC_TELEPHONE="en_US.ISO-8859-1"
LC_MEASUREMENT="en_US.ISO-8859-1"
LC_IDENTIFICATION="en_US.ISO-8859-1"
LC_ALL=
$ juvix init
juvix: <stdout>: commitAndReleaseBuffer: invalid argument (cannot encode character '\10024')
[artgur@artgur-blade juvix]$ export LANG=ru_RU.KOI8-R
[artgur@artgur-blade juvix]$ locale
LANG=ru_RU.KOI8-R
LC_CTYPE="ru_RU.KOI8-R"
LC_NUMERIC="ru_RU.KOI8-R"
LC_TIME="ru_RU.KOI8-R"
LC_COLLATE="ru_RU.KOI8-R"
LC_MONETARY="ru_RU.KOI8-R"
LC_MESSAGES="ru_RU.KOI8-R"
LC_PAPER="ru_RU.KOI8-R"
LC_NAME="ru_RU.KOI8-R"
LC_ADDRESS="ru_RU.KOI8-R"
LC_TELEPHONE="ru_RU.KOI8-R"
LC_MEASUREMENT="ru_RU.KOI8-R"
LC_IDENTIFICATION="ru_RU.KOI8-R"
LC_ALL=
[artgur@artgur-blade juvix]$ juvix init
juvix: <stdout>: commitAndReleaseBuffer: invalid argument (cannot encode character '\10024')
[artgur@artgur-blade juvix]$ export LANG=en_US.UTF-8 
$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
$ juvix init
✨ Your next Juvix adventure is about to begin! ✨
I will help you set it up
Write the name of your project [leave empty for 'juvix'] (lower case letters, numbers and dashes are allowed): 
^C
paulcadman commented 5 months ago

We couldn't figure this out. We'll reopen if this becomes a problem again.