JuliaData / DataFrames.jl

In-memory tabular data in Julia
https://dataframes.juliadata.org/stable/
Other
1.72k stars 367 forks source link

Inconsistent formatting of floating point numbers #3151

Open robsmith11 opened 1 year ago

robsmith11 commented 1 year ago

I would expect an entire column to either use scientific notation or not. It's especially odd that numbers in descending order flip between notations:

julia> d = DataFrame(x=[717068.92, 506896.01, 472500.85])
3×1 DataFrame
 Row │ x
     │ Float64
─────┼────────────────
   1 │      7.17069e5
   2 │ 506896.0
   3 │      4.72501e5

julia> d.x
3-element Vector{Float64}:
 717068.92
 506896.01
 472500.85

(@v1.9) pkg> st DataFrames
Status `/me/.julia/environments/v1.9/Project.toml`
  [a93c6f00] DataFrames v1.3.5

julia> versioninfo()
Julia Version 1.9.0-DEV.1223
Commit f066855cd5b (2022-08-31 02:40 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × AMD Ryzen 7 4700U with Radeon Graphics
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.5 (ORCJIT, znver2)
  Threads: 8 on 8 virtual cores
Environment:
  JULIA_DEPOT_PATH = /me/.julia
bkamins commented 1 year ago

@ronisbr - indeed it is pretty bad. Can we fix it?

ronisbr commented 1 year ago

Hi @bkamins and @robsmith11 !

This is not a bug, it is how Julia prints using :compact => true in IOContext:

julia> show(stdout, d; compact_printing = false)
3×1 DataFrame
 Row │ x
     │ Float64
─────┼───────────
   1 │ 717068.92
   2 │ 506896.01
   3 │ 472500.85

Notice:

julia> context = IOContext(stdout, :compact => true);

julia> str = sprint(print, 717068.92; context = context)
"7.17069e5"

julia> context = IOContext(stdout, :compact => false);

julia> str = sprint(print, 717068.92; context = context)
"717068.92"

We decided to use :compact => true by default. We can change, but there will be side effects.

ronisbr commented 1 year ago

By the way, notice that even with :compact => true it can happen:

julia> [717068.92, 506896.01, 4725000.85]
3-element Vector{Float64}:
 717068.92
 506896.01
      4.72500085e6
ronisbr commented 1 year ago

Another possibility is to align in e or .:

julia> show(stdout, d; alignment_anchor_regex = Dict(1 => [r"e", r"\."]))
3×1 DataFrame
 Row │ x
     │ Float64
─────┼───────────
   1 │ 7.17069e5
   2 │  506896.0
   3 │ 4.72501e5

But it will not be consistent with Julia array printing, which always align at the ..

bkamins commented 1 year ago

By the way, notice that even with :compact => true it can happen:

you mean :compact => false.


my general question would be how to turn-off scientific notation when printing floats as this is what users usually want.

ronisbr commented 1 year ago

AFAIK, there is not an option in Julia to print a number without scientific notation. The only way appears to be applying a formatter using Formatting.jl:

julia> using Formatting

julia> show(stdout, d, formatters = (v, i, j) -> v isa Number ? format(v) : v)
3×1 DataFrame
 Row │ x
     │ Float64
─────┼───────────
   1 │ 717068.92
   2 │ 506896.01
   3 │ 472500.85

Notice that overriding formatters will break some cases in DataFrames.

What we can do is adding an option to automatically add this formatter inside DataFrames (Formatting.jl is already used by PrettyTables).

bkamins commented 1 year ago

What we can do is adding an option to automatically add this formatter inside DataFrames

My point is, that maybe we can add kwarg to show that would allow user to disable scientific notation. I mark it as a decision for 1.5 release.

ronisbr commented 1 year ago

Yes, this is precisely what I thought. If the user pass no_scientific_notitation = true or something, we add this formatter together with the one we are current using.

Btw, this is a very simple change, I think I can do for 1.4 release.

bkamins commented 1 year ago

Ah - OK. If you have time then it is better to close issues 😄. Thank you!

ronisbr commented 1 year ago

No problem! Let's just merge the current PR and then I do this.

bkamins commented 1 year ago

Yes - we wait for @nalimilan to have some time to have a look at the PR as it is big