effekt-lang / effekt

A language with lexical effect handlers and lightweight effect polymorphism
https://effekt-lang.org
MIT License
318 stars 20 forks source link

Inconsistency between JS and Scheme backends regarding Unicode (double-)escapes #596

Open sideburns3000 opened 1 week ago

sideburns3000 commented 1 week ago

Hello dear team,

I seem to have encountered an inconsistency between the behaviour of programs compiled for the JS backend and for the Chez Scheme backend in the context of doubly escaped Unicode codepoints in strings.

The following line, compiled for the Node.js backend and executed in a Windows console:

println("\\u000A is the Unicode representation of \\n")

prints, as expected:

\u000A is the Unicode representation of \n

However, when compiled for the Chez Scheme backend, it prints:

\012 is the Unicode representation of \n

The generated Scheme code is (println_1 "\\012 is the Unicode representation of \\n").

Apparently, the Scheme version partly honours the double escape (insofar as it preserves it, and doesn't print an actual newline), but nevertheless converts the hexadecimal Unicode value into its octal representation. Triple-quoting doesn't prevent that either.

Unless I'm missing or misunderstanding something, this currently makes it necessary to use a workaround like

println("\\" ++ "u000A")

to print

\u000A

literally on the Scheme backend.

(Tested on Effekt from 2024-09-13, as of commit 19089d3)

Kind regards, Michael

jiribenes commented 1 week ago

Hi @sideburns3000,

thanks for the very nice and detailed issue! This indeed looks like something we might want to fix.

As far as I can tell, in both backends (JS and Chez rest.), we just search-and-replace these characters in order to escape: https://github.com/effekt-lang/effekt/blob/22ebac276e7f75ac45bd2e95646f5e6d3ad78540/effekt/shared/src/main/scala/effekt/generator/js/Transformer.scala#L23 https://github.com/effekt-lang/effekt/blob/22ebac276e7f75ac45bd2e95646f5e6d3ad78540/effekt/shared/src/main/scala/effekt/generator/chez/Transformer.scala#L46

But then in Chez, we do the following: https://github.com/effekt-lang/effekt/blob/22ebac276e7f75ac45bd2e95646f5e6d3ad78540/effekt/shared/src/main/scala/effekt/generator/chez/package.scala#L78-L88 which is fine, but I think we should merge this with the escape function and actually make sure we process \\ first (cc @b-studios), or at least modify the regex.

More generally, I think we should store the contents as actual unescaped bytes 0A instead of storing the characters \, u, 0, 0, 0, A, then this issue wouldn't have happened. This is related to #521 and the need to revamp escapes a little bit.