ascii-boxes / boxes

Command line ASCII boxes unlimited!
https://boxes.thomasjensen.com/
GNU General Public License v3.0
599 stars 78 forks source link

Odd encoding error on clone and instant diff #83

Closed mathomp4 closed 3 years ago

mathomp4 commented 3 years ago

This is something I didn't notice until now, but when I clone this repo I see:

❯ git clone https://github.com/ascii-boxes/boxes.git
Cloning into 'boxes'...
remote: Enumerating objects: 3348, done.
remote: Counting objects: 100% (347/347), done.
remote: Compressing objects: 100% (69/69), done.
remote: Total 3348 (delta 305), reused 293 (delta 277), pack-reused 3001
Receiving objects: 100% (3348/3348), 1.78 MiB | 21.67 MiB/s, done.
Resolving deltas: 100% (2134/2134), done.
error: failed to encode 'src/lexer.l' from UTF-8 to ISO_8859-15
error: failed to encode 'test/111_manual_encoding_iso.txt' from UTF-8 to ISO_8859-15

I've never seen those last two lines during a clone before. If I then go into the repo:

❯ cd boxes
❯ git status
On branch master
Your branch is up to date with 'origin/master'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
    modified:   src/lexer.l
    modified:   test/111_manual_encoding_iso.txt

no changes added to commit (use "git add" and/or "git commit -a")

and I see this diff:


❯ git --no-pager diff
diff --git a/src/lexer.l b/src/lexer.l
index b383d08..58039a3 100644
--- a/src/lexer.l
+++ b/src/lexer.l
@@ -99,7 +99,7 @@ static void report_state(char *symbol, char *text, char *expected_state_str);
 %s SHAPES

-PWORD     [a-zA-Z������][a-zA-Z0-9\-_�������]*
+PWORD     [a-zA-ZäöüÄÖÜ][a-zA-Z0-9\-_üäöÜÄÖß]*
 PWHITE    [\n \r\t]
 PBOX      Box
 SDELIM    [\"~\'`!@\%\&\*=:;<>\?/|\.\\]
diff --git a/test/111_manual_encoding_iso.txt b/test/111_manual_encoding_iso.txt
index 3a47b0b..b4b8d77 100644
--- a/test/111_manual_encoding_iso.txt
+++ b/test/111_manual_encoding_iso.txt
@@ -1,24 +1,24 @@
 :ARGS
 -ac -n ISO_8859-15
 :INPUT
-             �
-      �b
-      �b�
-    �b�d
-    �b�d�
-    �b�d�f
-     �b�d�fg
-    �b�d�fgh
+             ä
+      äb
+      äbç
+    äbçd
+    äbçdé
+    äbçdéf
+     äbçdéfg
+    äbçdéfgh
 :OUTPUT-FILTER
 :EXPECTED
     /**************/
-    /*     �      */
-    /*     �b     */
-    /*    �b�     */
-    /*    �b�d    */
-    /*   �b�d�    */
-    /*   �b�d�f   */
-    /*  �b�d�fg   */
-    /*  �b�d�fgh  */
+    /*     ä      */
+    /*     äb     */
+    /*    äbç     */
+    /*    äbçd    */
+    /*   äbçdé    */
+    /*   äbçdéf   */
+    /*  äbçdéfg   */
+    /*  äbçdéfgh  */
     /**************/
 :EOF
tsjensen commented 3 years ago

Thanks for reporting this! I saw the same problem on one of my machines, too.

The good news is that the files are properly encoded, so everything works. But sometimes, they show up as changed, which is bad. I'm investigating.

The problem was introduced with v2.0.0, when we starting having these different file encodings.

Currently, there are two possible causes, both of which may be true at the same time:

tsjensen commented 3 years ago

It seems to me that the files were in the repo in their target encoding instead of UTF-8. Git wants files to be UTF-8 in the repo, and converts to target encoding as needed upon checkout. This failed because the files were already in the target encoding.

This should be fixed now. The Git version risk remains, but will solve itself over time.

tsjensen commented 3 years ago

Judging from your comment on #82, I would assume that this issue is fixed, too?

mathomp4 commented 3 years ago

Judging from your comment on #82, I would assume that this issue is fixed, too?

Huh. Weird. This wasn't in my GitHub notifications anymore so I thought it was closed. Yes, I don't see any encoding issues!