Elm format changes large integer literals

harrysarson commented 5 years ago

The following elm code:

module Main exposing (toWrite)

largeInt: Int
largeInt=
    1000000000000000000000

is is changed by elm-format as such:

--- a/Main.elm
+++ b/Main.elm
@@ -3,4 +3,4 @@ module Main exposing (toWrite)

 toWrite : String
 toWrite =
-    String.fromInt 1000000000000000000000
+    String.fromInt 3875820019684212736

The code is malformed anyway due to https://github.com/elm/compiler/issues/1246 but this will become a problem as and when the elm compiler generates correct code for this. The underlying issue is the AST uses too small an integer datatype.

avh4 commented 5 years ago

Are you using the Windows binaries (which are compiled with 32-bit haskell), or the Mac/Linux ones?

harrysarson commented 5 years ago

Yes windows!

avh4 commented 5 years ago

Ah, okay, in that case I suspect it might be fixed if you were to compile elm-format from source with a 64-bit haskell installation.

I've only been releasing 32-bit windows binaries because it seemed to work and caused the least confusion, but I might try taking a look at building 64-bit also.

avh4 commented 5 years ago

Hmm... though it looks like this is reproducible on linux too 🤔

harrysarson commented 5 years ago

Presumably the root cause is the same here and over at the elm compiler?

harrysarson commented 4 years ago

The issue is that this number (1000000000000000000000 or 0x36_35C9ADC5_DEA00000) requires more than 64 bits to represent.

Elm format changes it to 3875820019684212736 or 0x35C9ADC5_DEA00000, truncating it to the least significant 64 bits.

Whilst elm format may well not want to support such big numbers, it should at the least exit with an error when such a literal is encountered.

harrysarson commented 4 years ago

What if elm-format did not try to encode literals as Int's in the AST but kept all literals as strings? This way there would be no encoding issues.

basile-henry commented 4 years ago

Integer (big num) is pretty standard in Haskell. I think using it instead of Int is the simplest change. If you encode it as String you run the risk of allowing non integers to be represented.

avh4 commented 4 years ago

I think this is actually correct for now, as elm-format shows the user how elm-compiler will interpret the code. When elm-compiler is changed, this should be revisited in elm-format.

harrysarson commented 4 years ago

I would prefer to see elm-format raising a parsing issue rather than silently corrupting* source code files.

Alternatively, if elm-format left big integer literals as they are there would be scope for tools like elm-review to start warning about them.

* I worry that corrupting is a subjective and also possibly inflammatory term. I use it only to hint at my underlying concern with marking this at wont-fix for now. I do definitely understand your rational for leaving this as it but I would like to register gentle disagreement. Please do not think I am notnhugely appreciative of this very good and useful tool!

avh4 commented 4 years ago

Hmm... in particular the point about elm-review is a very good one. I guess that would mean elm-format is implementing an imaginary spec for the Elm language that differs from what elm-compiler implements? I guess I can see the value of that, though I'm not entirely sure I want to take that on...

lmk think about this a bit.

Also, can anyone provide examples of real-world code where they use int literals this large or nearly this large?

jfmengels commented 4 years ago

I think it makes total sense for elm-review to report unresolved compiler bugs. Actually back in Elm 0.18, doing so was one of the applications I envisioned, there were way more compiler bugs too then.

That said, elm-review has a "rule discoverability" problem, in the sense that even if we add a rule, people won't necessarily enable it or even know it exists. If the proposed solution is general enough, I might think about enabling it in the default configuration, but that's a whole other topic.

FYI, it would be pretty easy for the rule to report an error only for 0.19.1 and not for 0.19.2 (should it be fixed in there), or even to report an error to disable the rule, if that's the better solution.

I do think that elm-format should not alter the behavior of the program. I think it might be okay as @harrysarson suggested to report a ~syntax error, but leaving it alone would also be very fine in my book.

avh4 commented 4 years ago

oh, I just realized... can elm-review actually report large ints? Because elm-review rules are written in Elm and executed as javascript, so it probably won't be able to detect overly large ints because those will get rounded down by nodejs before elm-review sees them?

jfmengels commented 4 years ago

Ugh, that's a good point. I think you're right that it won't, since an integer literally information (written in decimal or hexadecimal) is encoded as an Elm Int, meaning we'll lose the information beforehand.

That's an issue on the AST though. Maybe elm-syntax could just have an integer literal (one for both hexadecimal and decimal) containing the Int value and the literal string in the source code. Comparing the two would then make it relatively easy to see if there is a discrepancy or overflow between the two. (Cc @stil4m @martinsstewart).

I don't want to derail the conversation too much, so in short: I think elm-review would do well to handle this issue, but elm-syntax would need to adapt for it (but a major version is already in the works, so it's doable without too much hassle if we handle that before the next major version).

avh4 commented 4 years ago

okay, thanks! I think I'm gonna punt on this for 0.8.4, and look at this (and the related float issue #680 in 0.8.5)

avh4 / elm-format

Elm format changes large integer literals #635