Closed ExpandingMan closed 6 years ago
Merging #78 into master will decrease coverage by
9.16%
. The diff coverage is74.23%
.
@@ Coverage Diff @@
## master #78 +/- ##
==========================================
- Coverage 84.01% 74.85% -9.17%
==========================================
Files 3 4 +1
Lines 269 167 -102
==========================================
- Hits 226 125 -101
+ Misses 43 42 -1
Impacted Files | Coverage Δ | |
---|---|---|
src/source.jl | 55.22% <55.22%> (ø) |
|
src/metadata.jl | 80.76% <77.27%> (-19.24%) |
:arrow_down: |
src/loadfile.jl | 85.18% <85.18%> (ø) |
|
src/sink.jl | 93.61% <93.61%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 03a19c4...cbccde1. Read the comment docs.
Looking good on 0.6!
I have no idea what's going on with appveyor, any ideas?
Quick comments -- will try for a more detailed review within a couple days:
Feather.read
a very large file in microseconds and then pay the cost of deserialization only when I access the data. Feather.materialize
appears to undo any categorical variables. I think it makes sense to materialize into CategoricalArray{ValueType,1,Reftype,...}
instead of Array{ValueType}
I'm excited about this! let me take some time to do a proper review of your Arrow.jl work, but I like the direction this is headed.
Thanks all! Please see #80, as well as this and this. I plan on making the changes I mention today.
The random access abilities were the whole reason I bothered doing this in the first place! I envision dealing with 1TB tables relatively easily with DataFrames.jl and DataFramesMeta.jl. Note that you can also use DataFramesMeta.jl or Query.jl (if it's performant enough) combined with Feather as sort of a "database replacement".
Hello again. Ok, I've made the following changes:
Primitive
is blazing. It really is doing just about the bare minimum that it can possibly do. Contiguous views are about 30 ns always. List
is of course way slower but still pretty good. If we find we are still too slow with these, it would seem that it would have to be a limitation of Julia Mmap
or something like that.Nullable
types is still pretty terrible, but this is just because of the performance of Union
types in 0.6. Just doing convert(Vector{Union{T,Missing}}, A)
is pretty bad. We've been told this should improve drastically in 0.7, so hopefully we can remedy this then.DictEncoding
now works properly for any Integer
reference type. DictEncoding
now will return CategoricalArray
by default on any use of materialize
and any time it is indexed with :
.materialize
methods.Int32
reference types are ever written. This'll happen by default, but it's still possible to e.g. copy a previously existing DataFrame taht had non Int32
references. Fixing this seems like a surprisingly big pain in the ass, but I think it may be important.Thanks again for all your feedback. Will do more testing tomorrow!
I really think we should be targeting 0.7 here; in a few weeks, 0.7 will be in serious release-candidate mode and soon after 0.6 will be a long-thought of the past. 0.7/1.0 will be around for years, so I think all the focus, design decisions, and tradeoffs should be weighted in that direction.
Yeah, I agree. My doing everything on 0.6 is more a practical matter than anything else. I'll compile 0.7 and try messing around with it a bit.
Keep in mind that the pointers should work just fine in 0.7. reinterpret
may or may not, I still have to figure it out. So at least it's not as if the pointers are somehow specialized to 0.6.
I promise that I'll get everything up for 0.7 soon after release candidates are out, if not sooner. Any objections to deprecating 0.6 pretty much immediately (especially since breaking changes will be over until 2.0)?
It would make sense to me if you want to delay merging this until 0.7 is in full swing and I'm fully compatible.
I fixed a silly bug that was screwing up performance and did a little more performance testing (still in 0.6).
Aha! We are now pretty damn close to Python even in worst case scenario. I loaded a 20 million row dataframe with mostly NullableList
(Union{Missing,String}
) in about 19.9 seconds including compile time. Python feather and pyarrow took 14.4 seconds on the same task! This is of course still a significant difference but we will probably beat them handily with the Union
improvements in 0.7. Pretty satisfying considering all their code is written in C++. Again, right now Primitive
takes just about the minimum amount of time it can possibly take.
Whenever I try using materialize
I get bunch of inference errors on the first two attempts, but then it always works on the third... Sorry for the long dump here, but I'm going to post the output so it can be seen. Any ideas what might trigger something like this?
~|⇒ julia vpn-192-168-100-6
_
_ _ _(_)_ | A fresh approach to technical computing
(_) | (_) (_) | Documentation: https://docs.julialang.org
_ _ _| |_ __ _ | Type "?help" for help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 0.6.3-pre.0 (2017-12-18 07:11 UTC)
_/ |\__'_|_|_|\__'_| | Commit 93168a6826 (67 days old release-0.6)
|__/ | x86_64-apple-darwin17.3.0
julia> versioninfo()
Julia Version 0.6.3-pre.0
Commit 93168a6826 (2017-12-18 07:11 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin17.3.0)
CPU: Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, skylake)
julia> using Feather
WARNING: Method definition midpoints(Base.Range{T} where T) in module Base at deprecated.jl:56 overwritten in module StatsBase at /Users/sglyon/.julia/v0.6/StatsBase/src/hist.jl:535.
WARNING: Method definition midpoints(AbstractArray{T, 1} where T) in module Base at deprecated.jl:56 overwritten in module StatsBase at /Users/sglyon/.julia/v0.6/StatsBase/src/hist.jl:533.
julia> df = Feather.materialize("/Users/sglyon/Data/kn_data/2501_2015.feather");
ERROR: TypeError: issubtype: expected Type, got TypeVar
Stacktrace:
[1] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:2005
[2] (::Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState})(::Expr) at ./<missing>:0
[3] next(::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Int64) at ./generator.jl:45
[4] copy!(::Array{Any,1}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./abstractarray.jl:573
[5] _collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Core.Inference.HasShape) at ./array.jl:396
[6] collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./array.jl:393
[7] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1901
[8] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[9] (::Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState})(::Expr) at ./<missing>:0
[10] next(::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Int64) at ./generator.jl:45
[11] copy!(::Array{Any,1}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./abstractarray.jl:573
[12] _collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Core.Inference.HasShape) at ./array.jl:396
[13] collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./array.jl:393
[14] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1901
[15] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[16] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
[17] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
[18] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
[19] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
[20] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
[21] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
[22] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[23] (::Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState})(::Expr) at ./<missing>:0
[24] next(::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Int64) at ./generator.jl:45
[25] copy!(::Array{Any,1}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./abstractarray.jl:573
[26] _collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Core.Inference.HasShape) at ./array.jl:396
[27] collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./array.jl:393
[28] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1901
[29] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[30] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
[31] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
[32] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
[33] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
[34] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
[35] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
[36] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[37] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
[38] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
[39] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
[40] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
[41] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
[42] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
[43] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[44] (::Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState})(::Expr) at ./<missing>:0
[45] next(::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Int64) at ./generator.jl:45
[46] copy!(::Array{Any,1}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./abstractarray.jl:573
[47] _collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Core.Inference.HasShape) at ./array.jl:396
[48] collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./array.jl:393
[49] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1901
[50] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[51] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
[52] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
[53] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
[54] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
[55] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
[56] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
[57] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[58] abstract_interpret(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:2087
[59] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2669
[60] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
[61] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
[62] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
[63] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
[64] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
[65] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[66] abstract_interpret(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:2084
[67] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2669
[68] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
[69] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
[70] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
[71] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
[72] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
[73] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[74] (::Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState})(::Expr) at ./<missing>:0
[75] next(::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Int64) at ./generator.jl:45
[76] copy!(::Array{Any,1}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./abstractarray.jl:573
[77] _collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Core.Inference.HasShape) at ./array.jl:396
[78] collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./array.jl:393
[79] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1901
[80] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[81] (::Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState})(::Expr) at ./<missing>:0
[82] next(::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Int64) at ./generator.jl:45
[83] copy!(::Array{Any,1}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./abstractarray.jl:573
[84] _collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Core.Inference.HasShape) at ./array.jl:396
[85] collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./array.jl:393
[86] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1901
[87] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[88] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
[89] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
[90] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
[91] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
[92] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
[93] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
[94] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[95] abstract_interpret(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:2076
[96] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2669
[97] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
[98] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
[99] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
[100] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
[101] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
[102] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[103] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
[104] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
[105] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
[106] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
[107] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
[108] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
[109] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[110] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
[111] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
[112] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
[113] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
[114] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
[115] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
[116] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[117] (::Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState})(::Expr) at ./<missing>:0
[118] next(::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Int64) at ./generator.jl:45
[119] copy!(::Array{Any,1}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./abstractarray.jl:573
[120] _collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Core.Inference.HasShape) at ./array.jl:396
[121] collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./array.jl:393
[122] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1901
[123] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[124] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
[125] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
[126] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
[127] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
[128] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1922
[129] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[130] (::Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState})(::Expr) at ./<missing>:0
[131] next(::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Int64) at ./generator.jl:45
[132] copy!(::Array{Any,1}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./abstractarray.jl:573
[133] _collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Core.Inference.HasShape) at ./array.jl:396
[134] collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./array.jl:393
[135] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1901
[136] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[137] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
[138] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
[139] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
[140] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
[141] abstract_call(::Any, ::Tuple{}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
[142] abstract_iteration(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1510
[143] precise_container_type(::Any, ::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1494
[144] abstract_apply(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1542
[145] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1689
[146] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
[147] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[148] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
[149] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
[150] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
[151] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
[152] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
[153] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
[154] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[155] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
[156] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
[157] typeinf_frame(::Core.MethodInstance, ::Bool, ::Bool, ::Core.Inference.InferenceParams) at ./inference.jl:2504
[158] typeinf_code(::Core.MethodInstance, ::Bool, ::Bool, ::Core.Inference.InferenceParams) at ./inference.jl:2583
[159] typeinf_ext(::Core.MethodInstance, ::UInt64) at ./inference.jl:2622
[160] materialize(::String) at /Users/sglyon/.julia/v0.6/Feather/src/source.jl:98
julia> df = Feather.materialize("/Users/sglyon/Data/kn_data/2501_2015.feather");
ERROR: TypeError: issubtype: expected Type, got TypeVar
Stacktrace:
[1] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:2005
[2] (::Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState})(::Expr) at ./<missing>:0
[3] next(::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Int64) at ./generator.jl:45
[4] copy!(::Array{Any,1}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./abstractarray.jl:573
[5] _collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Core.Inference.HasShape) at ./array.jl:396
[6] collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./array.jl:393
[7] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1901
[8] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[9] (::Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState})(::Expr) at ./<missing>:0
[10] next(::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Int64) at ./generator.jl:45
[11] copy!(::Array{Any,1}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./abstractarray.jl:573
[12] _collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Core.Inference.HasShape) at ./array.jl:396
[13] collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./array.jl:393
[14] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1901
[15] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[16] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
[17] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
[18] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
[19] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
[20] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
[21] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
[22] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[23] (::Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState})(::Expr) at ./<missing>:0
[24] next(::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Int64) at ./generator.jl:45
[25] copy!(::Array{Any,1}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./abstractarray.jl:573
[26] _collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Core.Inference.HasShape) at ./array.jl:396
[27] collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./array.jl:393
[28] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1901
[29] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[30] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
[31] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
[32] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
[33] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
[34] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
[35] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
[36] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[37] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
[38] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
[39] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
[40] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
[41] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
[42] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
[43] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[44] (::Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState})(::Expr) at ./<missing>:0
[45] next(::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Int64) at ./generator.jl:45
[46] copy!(::Array{Any,1}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./abstractarray.jl:573
[47] _collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Core.Inference.HasShape) at ./array.jl:396
[48] collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./array.jl:393
[49] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1901
[50] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[51] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
[52] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
[53] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
[54] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
[55] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
[56] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
[57] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[58] abstract_interpret(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:2087
[59] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2669
[60] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
[61] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
[62] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
[63] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
[64] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
[65] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[66] abstract_interpret(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:2084
[67] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2669
[68] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
[69] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
[70] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
[71] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
[72] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
[73] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[74] (::Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState})(::Expr) at ./<missing>:0
[75] next(::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Int64) at ./generator.jl:45
[76] copy!(::Array{Any,1}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./abstractarray.jl:573
[77] _collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Core.Inference.HasShape) at ./array.jl:396
[78] collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./array.jl:393
[79] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1901
[80] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[81] (::Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState})(::Expr) at ./<missing>:0
[82] next(::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Int64) at ./generator.jl:45
[83] copy!(::Array{Any,1}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./abstractarray.jl:573
[84] _collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Core.Inference.HasShape) at ./array.jl:396
[85] collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./array.jl:393
[86] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1901
[87] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[88] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
[89] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
[90] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
[91] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
[92] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
[93] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
[94] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[95] abstract_interpret(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:2076
[96] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2669
[97] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
[98] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
[99] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
[100] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
[101] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
[102] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[103] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
[104] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
[105] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
[106] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
[107] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
[108] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
[109] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[110] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
[111] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
[112] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
[113] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
[114] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
[115] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
[116] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[117] (::Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState})(::Expr) at ./<missing>:0
[118] next(::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Int64) at ./generator.jl:45
[119] copy!(::Array{Any,1}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./abstractarray.jl:573
[120] _collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Core.Inference.HasShape) at ./array.jl:396
[121] collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./array.jl:393
[122] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1901
[123] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[124] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
[125] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
[126] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
[127] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
[128] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1922
[129] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[130] (::Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState})(::Expr) at ./<missing>:0
[131] next(::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Int64) at ./generator.jl:45
[132] copy!(::Array{Any,1}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./abstractarray.jl:573
[133] _collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Core.Inference.HasShape) at ./array.jl:396
[134] collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./array.jl:393
[135] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1901
[136] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[137] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
[138] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
[139] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
[140] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
[141] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
[142] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
[143] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
[144] abstract_interpret(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:2076
[145] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2669
[146] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
[147] typeinf_frame(::Core.MethodInstance, ::Bool, ::Bool, ::Core.Inference.InferenceParams) at ./inference.jl:2504
[148] typeinf_code(::Core.MethodInstance, ::Bool, ::Bool, ::Core.Inference.InferenceParams) at ./inference.jl:2583
[149] typeinf_ext(::Core.MethodInstance, ::UInt64) at ./inference.jl:2622
[150] materialize(::Feather.Source, ::AbstractArray{#s78,1} where #s78<:Integer, ::AbstractArray{T<:(Union{#s79, Symbol} where #s79<:Integer),1}) at /Users/sglyon/.julia/v0.6/Feather/src/source.jl:87
[151] materialize(::Feather.Source) at /Users/sglyon/.julia/v0.6/Feather/src/source.jl:97
[152] materialize(::String) at /Users/sglyon/.julia/v0.6/Feather/src/source.jl:98
julia> df = Feather.materialize("/Users/sglyon/Data/kn_data/2501_2015.feather");
julia>
Yeah, by now I've seen this too. I actually suspect this is a Julia bug, which I'm hoping will go away in 0.7. I want to post a Julia issue or something in discourse, but I've so far failed miserably at creating an MWE. It has something to do with the type parameter for Source
which is a big Tuple
object, but I think it is really being caused by the same parameter in Data.Schema
. @quinnj , have you seen anything like this working with Data.Schema
?
For testing purposes, for the time being Feather.materialize(df::DataFrame)
works fine after you have loaded a data frame with Feather.read
.
I hope that this PR to Julia fixes this issue in 0.7, but this is really just a guess.
By the way I'm open to changing the name of materialize
to load
if everyone thinks that's better. materialize
is more poetic and I'd like to keep it, but load
may require less explanation.
I'd love to see a julia 0.6 version of this. It seems to be mostly ready and usable on 0.6, and so why not make it available for normal users that can't jump on pre-release stuff? I'd love to see 0.7 soon, but if history is any guide, we should probably take the various schedules that are floating around with a grain of salt ;)
I agree with @davidanthoff here. While the 0.7/1.0 release is upcoming, 0.6 is still the official stable release and I personally would like to see it supported.
Thankfully, things seem to be working fairly smoothly on 0.6 thanks to @ExpandingMan and the 0.7-based decisions/changes are all (I think?) fairly easy to make.
I'm going to check out on Monday how easy it is to get it working on both. I checked the performance of reinterpret
on 0.7 today and it is fantastic, so it is definitely looking like we will be able to have full safety on 0.7. That's actually a pretty big deal because Arrow.jl is extremely dangerous without it. Even though it doesn't expose any pointers to a user, if the user chooses the wrong location in a buffer Arrow will cause buffer overruns (although I kept writing safe so you can't actually write past the end of a buffer). I don't want that thing out there in the wild causing segfaults as Julia users justifiably are not expecting those as a result of improper indexing, but certainly it is unacceptably slow on 0.6 without pointers. (The only alternatives would have been to insert checks everywhere, which may well wind up happening if we support 0.6 for very long.) We have seen Feather.jl master causing segfaults because of unforeseen oddities in Feather files (recent example #76) it would be nice to be able to guarantee that that will never happen again.
I don't know about you, but I'm probably going to feel pretty unconcerned with supporting 0.6 once there actually is a release candidate as obviously 0.7 is rather special.
Can't speak for anyone else, but at least in my groups an alpha (or even a release candidate) will be a non-event. They are trying to get real work done, and so for them it is all stable/released versions of everything (that is painful enough on julia...).
And in mine we haven't worked on 0.6 for a couple months now; 0.7 is actually very stable and provides a ton of improvements, though it does take a bit of work to get over the "compat" hump from 0.6 -> 0.7.
Of course we can still support 0.6, my comment was intended to convey that we should be targeting 0.7 features/design w/ it's imminent release and providing 0.6 compat as needed, instead of the other way around.
Well, it sounds like we'll definitely have to get everything working with both, but I agree with @quinnj's sentiment. It is probably a luxury of mainly having done scientific programming, but having lots of legacy stuff sitting around irks me. In Julia's case, I actually think that the symptoms of that problem are far worse than they usually are. I find that overwhelmingly later stuff is more efficient and stable on Julia. I hang out on the Julia discourse a lot and I see so many problems caused by new and inexperienced users winding up with old packages and old versions of Julia.
By the way, my reinterpret
bubble was burst when I realized this, so we're not quite there yet.
Anyway yes, we'll get it working on both, don't worry.
Don't worry, I haven't disappeared. I've been updating some dependencies (notably DataFrames) to 0.7 (with 0.6 compat) so that testing this is a bit cleaner.
I am more baffled than ever by the crazy type inference error that has been coming up when we use Feather.materialize
. See this and this.
Good news is that this PR is looking crazy fast. Note you can still test Feather.materialize
by first doing Feather.read
and doing Feather.materialize
on the resulting dataframe.
Looks like I may have been causing that bug with something that I thought was clever. Let this be a lesson against vanity.
Ok! Everything seems to be working now on both 0.6 and 0.7, and I have added lots of unit tests in Arrow.jl. Tomorrow I am going to look into if I can prevent that bizarre type inference error. Things also seem noticeably faster in 0.7. We are rapidly approaching a point when this might be merge-able, and performance is looking quite good so far! Benchmarking welcome!
Travis errors seem due to old DataStreams.jl version. @quinnj , any idea when you might be able to tag a new version?
No idea what is with travis MacOS 0.6.
Just merged some things on DataStreams master that I want to let settle for a bit; if you wanted to temporarily add Pkg.checkout("DataStreams")
to the PR branch here to help test things, I'd actually appreciate seeing if any other DataStreams errors pop up.
Ok. I've tested locally on DataStreams master, and I have not seen any errors or warnings on either 0.6 or 0.7. Note also that the current tagged version of DataStreams is also causing errors in DataFrames master on 0.7 (which I'm pretty sure are resolved in DataStreams master). Lastly, it looks like a lot of your 0.7 stuff in DataStreams master is now covered by Compat, so that may be worth checking out.
Good news, I have fixed the bizarre type inference bug that was being triggered when calling Feather.materialize
(and also during unit tests of Arrow.jl on 0.6). Also, on the few examples I have looked at so far the read performance is more than twice as fast on 0.7 as on 0.6. Presumably much of this difference comes from better handling of the Union{<:Any,Missing}
types. That said, I wouldn't be surprised if there is still a lot of room for improvement for the AbstractList
types (strings).
At this point everything seems to be working nicely and I have no further plans to work on Arrow.jl or Feather.jl in the immediate future, except for any changes that would need to be made to get this merged. Please keep me informed about any steps that would need to be taken, assuming there is interest in merging this and not creating a separate package. I know that I will have to tag a release of Arrow.jl and add it to METADATA, but I will wait for feedback if there is any. Thanks all.
(Note: test failures on Travis 0.7 seem to be from out-of-date (tagged) version of DataStreams. Again, no clue what's happening on 0.6 in MacOS or AppVeyor... not sure I've ever once actually seen an appveyor not throw an error...)
(Note: I've actually seen some significant performance regressions on 0.7 in the worst case scenarios. Still, for the most part it seems reasonably fast on 0.6 and 0.7. Writing efficient code is hard, we'll get there.)
I've just fixed a critical bug.
The Feather format does not support specifying that columns without nulls are "nullable". Therefore one must check that an AbstractVector{Union{T,Missing}}
in fact has missings before attempting to write it. If that is not the case, the Feather file will contain a NullableList
but the metadata will tell it to attempt to try to build a List
. When I get to it I need to add a test case for this. We've lost a little bit of write efficiency, because now AbstractVector{Union{T,Missing}}
must always be checked for missing
before anything is written.
@quinnj , have you had any time to look at this? I still need to add some unit tests to Arrow.jl before I tag a release, but that will come before next week.
I've noticed a big problem with Feather.materialize
. It's possible, in fact likely, that if you call this function and nothing else all references to the original data buffer will disappear so that it gets garbage collected and you segfault. I don't see any really elegant solution to this as things stand. Certainly I can get rid of the method that only takes a filename as argument, but we'd have to tell users "please keep the Source around somewhere, it can't get gc'd". The only completely reliable alternative I can think of would be to have materialize
always do a deepcopy
, but ugh, that would be absolutely awful.
My hope was that the pointers would be very temporary anyway, but I still haven't been able to get any news from the core devs on how hard it will be to fix ReinterpretArray
. See here. Suggestions, comments welcome!
GC.@preserve is the standard way to keep an object alive.
Thanks, but I've already thought of doing the equivalent and it doesn't solve the problem does it? Then the data will just never ever get gc'd even if the things referring to it go out of scope. This might be really bad if it's a sufficiently big dataset (though I'm very hazy on how exactly memory mapping works, so I don't know if that can help here).
The data is preserved for the scope of the preserve block. AFAIU there is no other way to guarantee not getting garbage collected.
Yeah, in that case we'd have to have users call it manually. Or, I suppose we could create a macro. Will have to think on it further, thanks for your help.
The way things "should" work is with the ReinterpretedArray but I know it is non overhead free right now :(
Yeah I know. I'm learning that pointers are actually much scarier in languages that are not designed to use them. Actually, reinterpret
seems fine in some cases but it's just so unpredictable. Any idea what the thinking is on it, whether this will be something that will be easy or difficult to fix? I'd feel a lot better about using it if I thought there was a reasonable expectation of it getting better sometime in the foreseeable future. So far I haven't heard any kind of assessment about what might be wrong and what might be required to fix it (and I'm afraid it's way above my expertise to look into myself).
Great thanks.
I'm hoping you'll be very pleased with how isolated the pointer code is in Arrow.jl, I tried really hard to make it safe. Of course, there's nothing I can do about users giving the wrong array indices, but I hope that's all that can really go wrong.
Like I said, I'm still contemplating improving the Arrow.jl constructors, so that may change some things here. I also have to update it so that we can use other data types for the offsets, I think we'll need to do Int64
by default.
Ok, I've just added a whole lot more unit tests, so this damn thing better not have any mysterious uncaught errors anymore. Still not totally sure how that godawful segfault was getting through those tests. Think 0.6 fails for some reason, will fix eventually.
Absolutely, feel free to make push's to my PR. I've never head push's on a PR before, do I just merge them the same way I would PR's to master?
Also keep in mind that when we merge this we don't have to tag it immediately, so we can always make additional PR's immediately after. We'll have to do some documentation PR's before tagging regardless (I intended to wait until this is merged to do that).
I've fixed the broken unit test on 0.6. Note that DataStreams will need to be tagged for us to not get test errors on 0.7.
Thanks @ExpandingMan for all the hard work here; awesome to see all the new arrow/feather functionality progressing.
I have rewritten Feather.jl to use my new Arrow.jl back-end. The Arrow.jl package provides
AbstractVector
objects that provide access to Arrow formatted data. Because the existing Feather.jl mostly deals with accessing Arrow data, this rewrite was very extensive. This PR should maintain all existing functionality and expands on it, with the exception of appending DataFrames (more on this below). What follows is an overview of the overhauled package.Arrow.jl of course needs a tagged released and complete unit tests for this to be merged, but I wanted to put up this PR so we could start figuring out what would need to be done.
New Default Reading Behavior
Creating a
Feather.Source
, or callingFeather.read
will now only constructArrowVector
objects. In the case ofFeather.read
aDataFrame
will be created withArrowVector
columns.ArrowVector
s simply reference existing data, so, in the case of memory mapping, once the file is memory mapped nothing is actually read in until requested by the user. This allows the user to browse a feather file at their leisure, even performing query operations while only loading in data as necessary. The old default functionality of reading the entire file into memory is now provided byFeather.materialize
. This method takes care of not only the requested behavior of reading in only particular columns, but any arbitrary subset of the full table.Better Memory Safety
This has been discussed extensively elsewhere. If
reinterpret
is ever more efficient we will have full memory safety, but that seems a long way off.Dropped Support for Some Non-Standard Formatting
In particular, categorical arrays now must use
Int32
reference values. This is specified by the Arrow Standard. This also no longer supports the really old version of Feather that didn't use Arrow padding, but as there was a warning saying that that data would be unreadable anyway this seems fine.Less Dependent on DataStreams
@davidanthoff was asking if we could split off the core functionality of Feather into a sepearate FeatherBase.jl that doesn't depend on DataStreams. Since a great deal of the functionality of this package has been moved to Arrow in this PR anyway, I thought it would be really great if we could keep this whole. While retaining all DataStreams functionality and the Source/Sink structure, the only place where the core functionality of this package really relies on DataStreams is now
Data.Schema
, which, to my knowledge, has never changed since DataStreams was created. Hopefully everyone will be sufficiently happy with this that we don't need to bother creating a new package? :wink:Appropriate Updates to Unit Tests
Mostly they are now organized into
@testset
. In some cases slight adjustments to the tests were needed.