Closed johanstrand closed 3 years ago
I've managed to reproduce the bug by building the dockerfile used by kroki
and then passing a file containing non-ASCII characters:
docker run -v ~/file.er:/file.er cb9853b485bc /root/.local/bin/erd -i /file.er
Creating an image using the above mentioned Dockerfile by @kukimik I was able to reproduce the issue. However, using the following erd file:
[Person]
*nameÄÖ
height
weightÖ_Ξξ
`birth date ÄÖÄÖÄÖÄÖÄÖ`
+birth_place_id
[`Birth Place`]
*id
`birth city`
'birth state'
"birth country"
"lambda: λ λ λ λ λ λ"
Person *--1 `Birth Place`
which contains additional Unicode characters: lambda and some other Greek ones ( ;) ) results the expected output when erd is freshly compiled from source and executed on the same system where it was compiled.
The container is very helpful to check further whether the recommended way to fix this works.
https://serokell.io/blog/haskell-with-utf8 is an interesting read.
I may try to fix this, maybe this week.
Also, I've found that the problem does not come from the build environment. I can reproduce it using erd
compiled on my machine using stack. I just need to change the current locale (I'm on Linux and using @mmzx's example file):
$ LANG=C erd -i file.er
erd: file.er: hGetContents: invalid argument (invalid byte sequence)
I've tried the simplest solution using with-utf8 (i.e. main = withUtf8 $ do ...
) and it seems to work ok with -i file.er
. The output (both written to files and to stdout) looks ok. However the following:
$ LANG=C erd < examples/simple.er
fails with:
"<stdin>" (line 2, column 6):
unexpected '\65533'
expecting attribute
I was never strong with encodings; I need to understand what is going on here and what is the expected behaviour.
I've also started looking into it. Right now this place comes to my mind where setting the encoding is used for the very same purpose.
Later tonight I will give it a try.
So far... I've just tried these experimentally.
When the LANG
environment variable gets unset
it fails. Indeed, the LANG variable is not set in the above mentioned docker image when using the bash shell.
Perhaps I shall read the article first about with-utf8 package. :)
There is an alternative way as I recall, but that involves the use of Data.Text.IO
from text
package which itself has an utf8 encoding function...
I get the error message
hGetContents: invalid argument (invalid byte sequence)
When I try a ERD specification with Swedish characters Å, Ä or Ö. I do this using Kroki (https://kroki.io/), but I suspect the issue is with erd since other diagram types, for instance GraphWiz, works. A quick google points to using hSetEncoding to avoid this problem.