corretto / corretto-21

GNU General Public License v2.0
72 stars 19 forks source link

regression in JDK21 when transforming a double into a String #76

Closed DrMirakle closed 4 weeks ago

DrMirakle commented 4 weeks ago

conversion of double to string isn't consistent in JDK21 for some specific values of double:

image

logic is different between the two JDKs

image

shipilev commented 4 weeks ago

I suspect this is the consequence of the fix in JDK 19 (JDK-4511638), see the accompanying Release Note: JDK-8291475.

shipilev commented 4 weeks ago

Yeah, see:

public class DoubleTest {
    public static void main(String... args) {
        double d1 = 4.6566128730773926E-10;
        double d2 = 4.65661287307739261E-10;
        System.out.println(Double.toString(d1));
        System.out.println(Double.toString(d2));
    }
}

% /Library/Java/JavaVirtualMachines/amazon-corretto-17.jdk/Contents/Home/bin/java DoubleTest.java
4.6566128730773926E-10
4.6566128730773926E-10

% /Library/Java/JavaVirtualMachines/amazon-corretto-21.jdk/Contents/Home/bin/java DoubleTest.java
4.656612873077393E-10
4.656612873077393E-10

JDK 21 rounds one unit more in this very small number, like Release Note predicts might happen. The second literal example also shows that you cannot expect that double literal in Java code matches the string representation exactly, even for JDK 17.

DrMirakle commented 4 weeks ago

Thanks for the information.

I could be wrong, but I think that for the specific example:

double d1 = 4.6566128730773926E-10;
double d2 = 4.65661287307739261E-10;

d1 and d2 show the same in JDK17 (as d1) because the constant you're trying to store into d2 goes beyond what a double can store, and that extra 1 after the final 6 isn't represented in the double bit pattern (because only 53 bits are available for significand, giving 15 to 17 decimal digits, and for d2 you try to store 18 digits), and the double bit pattern for d2 is the same as d1. So, it's a matter of double storage covered by IEEE 754, not a matter of double to String transformation.

(you can see this in something like Eclipse by mousing over d1 and d2 when debugging).

benty-amzn commented 4 weeks ago

(there's a well defined way to exactly represent a standard double as a string of characters in scientific notation).

I think this may be the source of the confusion. There exist at least two well-defined ways to represent an IEEE 754 double as a string of characters:

Note that the specifications are not identical, and so

That's a 'bug' in the double to String conversion - there's no reason this happens since there's a well defined way to exactly represent a standard double as a string of characters in scientific notation

does not follow.

If I've misunderstood something and you can demonstrate where the output produced by Corretto 21 differs from the specification for that JDK version, we can follow up on that.

DrMirakle commented 4 weeks ago

Thanks, I see, I guess that's fine then. I wasn't sure there was awareness about this regression (someone on our side looked at notes on the various JDKs and didn't see it, but it's easy to miss). And I agree that not all regressions are 'bugs' (like a switch in algorithm to get more perf at the cost of precision, etc).

shipilev commented 4 weeks ago

Right, there might be representational issues that are not covered by my previous test. I need to brush up on my IEEE-754 representation knowledge :) I think we can peek into IEEE representation with doubleToRawLongBits like this:

public class DoubleTest {
        public static void main(String... args) {
                double d1 = 4.656612873077393E-10;
                double d2 = 4.6566128730773926E-10;
                double d3 = 4.65661287307739261E-10;
                System.out.println(Double.toString(d1));
                System.out.println(Double.toString(d2));
                System.out.println(Double.toString(d3));
                System.out.println(Long.toHexString(Double.doubleToRawLongBits(d1)));
                System.out.println(Long.toHexString(Double.doubleToRawLongBits(d2)));
                System.out.println(Long.toHexString(Double.doubleToRawLongBits(d3)));
        }
}

$ jdk-17/bin/java DoubleTest.java
4.6566128730773926E-10
4.6566128730773926E-10
4.6566128730773926E-10
3e00000000000000
3e00000000000000
3e00000000000000

$ jdk-21/bin/java DoubleTest.java
4.656612873077393E-10
4.656612873077393E-10
4.656612873077393E-10
3e00000000000000
3e00000000000000
3e00000000000000

What I can infer from this test is that representations for ...93E-10 and ...926E-10 and ...9261E-10 literals are the same everywhere. JDK 21 chooses to stick with ...93E-10 string representation for all three, while JDK 17 chooses to stick with ...926E-10. This does not look like a 17 -> 21 bug to me, but rather the expected behavior from a tightened spec and implementation.

Note that JDK 17 makes a surprising "conversion" of ...93E-10 into ...926E-10, I suspect that is the essence of the bug that was fixed in JDK 19. To my non-FP-expert view, JDK 21 behavior looks more understandable: it consistently rounded up instead of hallucinating a digit in last place. I wonder if you would see the same in your Eclipse test with JDK 17? :)

Also, 3e00000000000000 hex representation means 00111110 00000000 00000000 00000000 00000000 00000000 00000000 00000000 binary, i.e. 0 sign, 01111100000 exponent, 0...0 significand. Various IEEE-754 online calculators give me 4.656612873077393E-10 as the string representation for this number, effectively siding with JDK 21.

DrMirakle commented 4 weeks ago

Right, the bit pattern for those two double d1 = 4.656612873077393E-10; double d2 = 4.6566128730773926E-10; is the same.

For us the issue wasn't really about which String is 'better' based on some corner cases (but obviously someone thought there was actually a bug and changed it in jdk19) but that the String is different depending on JDKs, creating rare regressions. Anyway it doesn't look like this is going to change any time soon again, so that's ok I guess.