Open sideburns3000 opened 1 week ago
Hi @sideburns3000,
thanks for the very nice and detailed issue! This indeed looks like something we might want to fix.
As far as I can tell, in both backends (JS and Chez rest.), we just search-and-replace these characters in order to escape: https://github.com/effekt-lang/effekt/blob/22ebac276e7f75ac45bd2e95646f5e6d3ad78540/effekt/shared/src/main/scala/effekt/generator/js/Transformer.scala#L23 https://github.com/effekt-lang/effekt/blob/22ebac276e7f75ac45bd2e95646f5e6d3ad78540/effekt/shared/src/main/scala/effekt/generator/chez/Transformer.scala#L46
But then in Chez, we do the following:
https://github.com/effekt-lang/effekt/blob/22ebac276e7f75ac45bd2e95646f5e6d3ad78540/effekt/shared/src/main/scala/effekt/generator/chez/package.scala#L78-L88
which is fine, but I think we should merge this with the escape
function and actually make sure we process \\
first (cc @b-studios), or at least modify the regex.
More generally, I think we should store the contents as actual unescaped bytes 0A
instead of storing the characters \, u, 0, 0, 0, A
, then this issue wouldn't have happened. This is related to #521 and the need to revamp escapes a little bit.
Hello dear team,
I seem to have encountered an inconsistency between the behaviour of programs compiled for the JS backend and for the Chez Scheme backend in the context of doubly escaped Unicode codepoints in strings.
The following line, compiled for the Node.js backend and executed in a Windows console:
println("\\u000A is the Unicode representation of \\n")
prints, as expected:
However, when compiled for the Chez Scheme backend, it prints:
The generated Scheme code is
(println_1 "\\012 is the Unicode representation of \\n")
.Apparently, the Scheme version partly honours the double escape (insofar as it preserves it, and doesn't print an actual newline), but nevertheless converts the hexadecimal Unicode value into its octal representation. Triple-quoting doesn't prevent that either.
Unless I'm missing or misunderstanding something, this currently makes it necessary to use a workaround like
println("\\" ++ "u000A")
to print
literally on the Scheme backend.
(Tested on Effekt from 2024-09-13, as of commit 19089d3)
Kind regards, Michael