elm / compiler

Compiler for Elm, a functional language for reliable webapps.
https://elm-lang.org/
BSD 3-Clause "New" or "Revised" License
7.51k stars 656 forks source link

elm compiler fails on large list literals #1585

Closed knewter closed 6 years ago

knewter commented 7 years ago

https://gist.github.com/knewter/2df161dc337581e15fc01b66e7606e82

This is an example file. If you download it and try to compile it with elm-make, it will fail to compile with:

jadams ~/e/elm_compiler_large_list_literals_bug λ elm-make Main.elm                                                                   3m 4s 865ms
[                                                  ] - 0 / 1elm-make: elm-stuff/build-artifacts/0.18.0/user/project/1.0.0/Main.elmo: commitAndReleaseBuffer: invalid argument (invalid character)
elm-make: thread blocked indefinitely in an MVar operation

If you remove the last element from the list, it will compile successfully. It's nothing to do with the content of the new item, because duplicating the last item in the list causes the same error.

In case it's relevant, there are unicode characters in this list (it's intended to be a list of all emoji for an emoji picker, for context).

So it's also worth noting that the same general problem happened when I was doing this with a Dict using Dict.insert rather than a List literal. I switched to the List to try to avoid that issue. And I had to reduce the size of the List in the SSCCE to get it to compile. It compiles with more entries in another project. This is all just hopefully valuable context.

process-bot commented 7 years ago

Thanks for the issue! Make sure it satisfies this checklist. My human colleagues will appreciate it!

Here is what to expect next, and if anyone wants to comment, keep these things in mind.

jvoigtlaender commented 7 years ago

Possibly related to https://github.com/elm-lang/elm-compiler/issues/1551? @evancz

evancz commented 6 years ago

With the development build, I am seeing the following error:

-- PARSE ERROR -------------------------------------------------------- temp.elm

Backslashes always start escaped characters, but I do not recognize this one:

164|     , ( "1f004", ( "\xD83C\xDC04", "mahjong", [ "mahjong" ] ) )
                         ^
Maybe there is some typo?

Hint: Valid escape characters include:

    \n
    \r
    \t
    \"
    \'
    \\
    \u{03BB}

The last one lets encode ANY character by its Unicode code point, so use that
for anything outside the ordinary six.

And when I update to the new syntax for escapes:

    , ( "1f004", ( "\u{1f004}", "mahjong", [ "mahjong" ] ) )
    , ( "1f0cf", ( "\u{1f0cf}", "black_joker", [ "black_joker" ] ) )
    , ( "1f170", ( "\u{1f170}", "a", [ "a" ] ) )
    , ( "1f171", ( "\u{1f171}", "b", [ "b" ] ) )

I am not having any issues with the code.

I wonder if the root thing here was related to #1623, where it seems like invalid surrogate pairs could mess things up. I am not sure, but the SSCCE now works fine, so we should revisit if there are still issues in 0.19

evancz commented 6 years ago

It may also be platform/hardware/architecture specific, so if it appears again, add that info as well!