JuliaStats / RDatasets.jl

Julia package for loading many of the data sets available in R
GNU General Public License v3.0
159 stars 56 forks source link

huge variance in time to load iris #79

Open davidbp opened 4 years ago

davidbp commented 4 years ago

Hello

I have observed a 10x difference when loading the iris dataset in 2 different machines.

Loading times are a bit unreasonable, is there anything I can do to speed this up?

ulia> using RDatasets

julia> @time iris = dataset("datasets", "iris"); # a DataFrame
100.068931 seconds (75.23 M allocations: 4.053 GiB, 3.19% gc time)

julia> 102.497734 seconds (75.35 M allocations: 4.062 GiB, 3.33% gc time)
       (v1.2) pkg> status RDatasets
           Status `~/.julia/environments/v1.2/Project.toml`
         [a93c6f00] DataFrames v0.19.4
         [ce6b1742] RDatasets v0.6.4

julia> versioninfo()
Julia Version 1.2.0
Commit c6da87ff4b (2019-08-20 00:03 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.6.0)
  CPU: Intel(R) Core(TM) i5-4278U CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, haswell)
Environment:
  JULIA_EDITOR = subl

(v1.2) pkg> status RDatasets
    Status `~/.julia/environments/v1.2/Project.toml`
  [336ed68f] CSV v0.5.14
  [a93c6f00] DataFrames v0.19.4
  [ce6b1742] RDatasets v0.6.4

In the other machine I get:

julia> using RDatasets
[ Info: Recompiling stale cache file /home/david/.julia/compiled/v1.1/RDatasets/JyIbx.ji for RDatasets [ce6b1742-4840-55fa-b093-852dadbb1d8b]

julia> @time iris = dataset("datasets", "iris"); 
 10.544570 seconds (37.27 M allocations: 1.767 GiB, 8.98% gc time)

julia> versioninfo()
Julia Version 1.1.0
Commit 80516ca202 (2019-01-21 21:24 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, haswell)

(v1.1) pkg> status RDatasets
    Status `~/.julia/environments/v1.1/Project.toml`
  [336ed68f] CSV v0.5.14
  [a93c6f00] DataFrames v0.18.4
  [ce6b1742] RDatasets v0.6.1
ppalmes commented 4 years ago

same observation. just simple loading of iris dataset takes more than 80 seconds in a 2017 Mac running Julia 1.2 and Julia 1.3.

ppalmes commented 4 years ago

It's way faster: using RCall iris = R"iris" |> rcopy