MichaelChirico / r-bugs

A ⚠️read-only⚠️mirror of https://bugs.r-project.org/
20 stars 0 forks source link

[BUGZILLA #17281] print.summaryDefault(): incorrect rounding on some Linux systems #6456

Open MichaelChirico opened 4 years ago

MichaelChirico commented 4 years ago

On some (not all) Linux systems, print.summaryDefault() incorrectly rounds the mean value and/or the median value (and perhaps also other values).

Example: R> a <- 1234568.01 + c(0:1)

Incorrect output on my Ubuntu 16.04 LTS 64 bit computer (details see below) R> summary(a) Min. 1st Qu. Median Mean 3rd Qu. Max. 1234568 1234568 1234568 1234568 1234569 1234569

Correct output on other computers (e.g. Windows, Dirk Eddelbuettels Ubuntu 17.04 64 bit computer): R> summary(a) Min. 1st Qu. Median Mean 3rd Qu. Max. 1234568 1234568 1234569 1234569 1234569 1234569

The following commands give the correct output on all (?) computers: R> print(summary(a), digits=9) Min. 1st Qu. Median Mean 3rd Qu. Max. 1234568.0 1234568.3 1234568.5 1234568.5 1234568.8 1234569.0 R> summary(a)["Mean"] Mean 1234569 R> mean(a) [1] 1234569 R> print(mean(a), digits=9) [1] 1234568.51

see also: https://stat.ethz.ch/pipermail/r-devel/2017-May/074351.html

My computer: R> Sys.info() sysname "Linux" release "4.5.0-040500rc6-generic" version "#201602281230 SMP Sun Feb 28 17:33:02 UTC 2016" nodename "arne-HP-EB-8560w" machine "x86_64" login "arne" user "arne" effective_user "arne" R> sessionInfo() R version 3.4.0 (2017-04-21) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.2 LTS

Matrix products: default BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0 LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0

locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=da_DK.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=da_DK.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=da_DK.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached): [1] compiler_3.4.0 tools_3.4.0


METADATA

MichaelChirico commented 4 years ago

This is due to the use of zapsmall() in format.summaryDefault(); i.e., double rounding.

print(zapsmall(1234568.51),digits=10)

[1] 1234568.5

print(round(zapsmall(1234568.51)),digits=10)

[1] 1234568

print(round(1234568.51),digits=10)

[1] 1234569

The 2nd one being due to round-to-even, apparently not used on some systems.

I suspect this is not easy to fix without problems popping up elsewhere, since the zapsmall() is likely there for a reason.


METADATA

MichaelChirico commented 4 years ago

Yes, Peter, you are right. zapsmall() likely causes the difference:

I get (Ubuntu 16.04.02 LTS 64 bit, for details see my previous message): R> zapsmall(1234568.51) [1] 1234568

While my colleague gets (Windows 7, 64 bit, for details see below): R> zapsmall(1234568.51) [1] 1234569

Why do we get different outputs?

Which one is correct/expected? (I guess that most people would expect 1234569.)

Is there anything that we can do to get the same outputs on our computers?

My colleague's computer:

Sys.info()
   sysname        release        version       nodename 
 "Windows"        "7 x64"   "build 7600"      "USER-PC" 
   machine          login           user effective_user 
  "x86-64"         "user"         "user"         "user" 
sessionInfo()

R version 3.4.0 (2017-04-21) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7600)

Matrix products: default

locale: [1] LC_COLLATE=C
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=Eng lish_United States.1252 [4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages: [1] stats graphics grDevices utils datasets methods
[7] base

other attached packages: [1] foreign_0.8-67

loaded via a namespace (and not attached): [1] compiler_3.4.0 tools_3.4.0


METADATA

MichaelChirico commented 4 years ago

The difference is caused by different rounding in snprintf on Linux and Windows. In a recent GLIBC the default is rounding to nearest (aka round-half-to-even) and this is honored also for printf/snprintf, which is the case also on Ubuntu 17.04. Round-half-to-even is also the default by IEEE 754.

R uses snprintf to round when printing floating point numbers and the reported example can be narrowed down to

x <- 1234568.5 ; x

[1] 1234568 <=== Ubuntu 17.04 (glibc 2.24, round-half-to-even) [1] 1234569 <=== Windows 10

Even in Windows one can select a rounding mode using fesetround and round-half-to-even is also the default, but this setting has no impact on printf (and a number of other functions) which always round half-away-from-zero. One can use e.g. rint to round values following the selected rounding mode.


METADATA

MichaelChirico commented 4 years ago

Thanks for the explanation, Tomas!

I understand your example with 1234568.5, which has the same distance to 1234568 and to 1234569 but my example used the number 1234568.51, which is closer to 1234569 than to 1234568 and, thus, should be rounded to 1234569. Or do the functions do repeated rounding?

round(round(1234568.51,1))

[1] 1234568


METADATA

MichaelChirico commented 4 years ago

Yes, the first rounding is in zapsmall and the second when printing. On both Linux and Windows zapsmall(1234568.51) returns 1234568.5 and this number is rounded differently on Linux and Windows when printed.


METADATA

MichaelChirico commented 4 years ago

Thanks for your additional explanations, Tomas!

So there are two "issues" (which both could perhaps be called "bugs"):

a) repeated rounding in print.summaryDefault()

b) different rounding of .5 on different computers

I think that it would be great if both of these "issues" could be fixed.


METADATA