JuliaData / DataTables.jl

(DEPRECATED) A rewrite of DataFrames.jl based on Nullable
Other
29 stars 11 forks source link

describe does not function #60

Closed oxinabox closed 7 years ago

oxinabox commented 7 years ago

The docs define describe.

... If the column's base type derives from Number, compute the minimum, first quantile, median, mean, third quantile, and maximum. Nulls are filtered and reported separately.

But a MWE with columns types as Float64

dt = DataTable(a=rand(10), b=randn(10))
describe(dt)

Outputs:

a
Summary Stats:
Length:         10
Type:           Nullable{Float64}
Number Unique:  10

b
Summary Stats:
Length:         10
Type:           Nullable{Float64}
Number Unique:  10

This is on Stable version of DataTables with

Julia Version 0.5.1
Commit 6445c82 (2017-03-05 13:25 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-5930K CPU @ 3.50GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.7.1 (ORCJIT, haswell)

Contrast this with the expected behaviour from DataFrames:

dt = DataFrame(a=rand(10), b=randn(10))
describe(dt)
a
Summary Stats:
Mean:           0.554118
Minimum:        0.050461
1st Quartile:   0.361887
Median:         0.530056
3rd Quartile:   0.836832
Maximum:        0.933014
Length:         10
Type:           Float64

b
Summary Stats:
Mean:           0.054508
Minimum:        -1.119196
1st Quartile:   -0.792691
Median:         0.165122
3rd Quartile:   0.749462
Maximum:        1.449910
Length:         10
Type:           Float64
nalimilan commented 7 years ago

Fixed on master (with latest release of NullableArrays):

julia> dt = DataTable(a=rand(10), b=randn(10));

julia> describe(dt)
a
Summary Stats:
Mean:           0.644547
Minimum:        0.018852
1st Quartile:   0.473012
Median:         0.659144
3rd Quartile:   0.920300
Maximum:        0.967488
Length:         10
Type:           Float64

b
Summary Stats:
Mean:           0.113890
Minimum:        -1.677980
1st Quartile:   -0.954596
Median:         0.265165
3rd Quartile:   1.244493
Maximum:        1.609289
Length:         10
Type:           Float64