ExpandingMan commented 6 years ago

I have rewritten Feather.jl to use my new Arrow.jl back-end. The Arrow.jl package provides AbstractVector objects that provide access to Arrow formatted data. Because the existing Feather.jl mostly deals with accessing Arrow data, this rewrite was very extensive. This PR should maintain all existing functionality and expands on it, with the exception of appending DataFrames (more on this below). What follows is an overview of the overhauled package.

Arrow.jl of course needs a tagged released and complete unit tests for this to be merged, but I wanted to put up this PR so we could start figuring out what would need to be done.

New Default Reading Behavior

Creating a Feather.Source, or calling Feather.read will now only construct ArrowVector objects. In the case of Feather.read a DataFrame will be created with ArrowVector columns. ArrowVectors simply reference existing data, so, in the case of memory mapping, once the file is memory mapped nothing is actually read in until requested by the user. This allows the user to browse a feather file at their leisure, even performing query operations while only loading in data as necessary. The old default functionality of reading the entire file into memory is now provided by Feather.materialize. This method takes care of not only the requested behavior of reading in only particular columns, but any arbitrary subset of the full table.

Better Memory Safety

This has been discussed extensively elsewhere. If reinterpret is ever more efficient we will have full memory safety, but that seems a long way off.

Dropped Support for Some Non-Standard Formatting

In particular, categorical arrays now must use Int32 reference values. This is specified by the Arrow Standard. This also no longer supports the really old version of Feather that didn't use Arrow padding, but as there was a warning saying that that data would be unreadable anyway this seems fine.

Less Dependent on DataStreams

@davidanthoff was asking if we could split off the core functionality of Feather into a sepearate FeatherBase.jl that doesn't depend on DataStreams. Since a great deal of the functionality of this package has been moved to Arrow in this PR anyway, I thought it would be really great if we could keep this whole. While retaining all DataStreams functionality and the Source/Sink structure, the only place where the core functionality of this package really relies on DataStreams is now Data.Schema, which, to my knowledge, has never changed since DataStreams was created. Hopefully everyone will be sufficiently happy with this that we don't need to bother creating a new package? :wink:

Appropriate Updates to Unit Tests

Mostly they are now organized into @testset. In some cases slight adjustments to the tests were needed.

codecov-io commented 6 years ago

Codecov Report

Merging #78 into master will decrease coverage by 9.16%. The diff coverage is 74.23%.

@@            Coverage Diff             @@
##           master      #78      +/-   ##
==========================================
- Coverage   84.01%   74.85%   -9.17%     
==========================================
  Files           3        4       +1     
  Lines         269      167     -102     
==========================================
- Hits          226      125     -101     
+ Misses         43       42       -1

Impacted Files	Coverage Δ
src/source.jl	`55.22% <55.22%> (ø)`
src/metadata.jl	`80.76% <77.27%> (-19.24%)`	:arrow_down:
src/loadfile.jl	`85.18% <85.18%> (ø)`
src/sink.jl	`93.61% <93.61%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 03a19c4...cbccde1. Read the comment docs.

ExpandingMan commented 6 years ago

Looking good on 0.6!

I have no idea what's going on with appveyor, any ideas?

sglyon commented 6 years ago

Quick comments -- will try for a more detailed review within a couple days:

Looks great! I like how clean things are. I think this will be a step forward.
I really like how you leverage the random access abilities of the arrow/feather spec. It is amazing to Feather.read a very large file in microseconds and then pay the cost of deserialization only when I access the data.
I have feather files produced from python with non-Int32 reference types on categorial variables. That doesn't work here. I have a patch to this branch and Arrow.jl that supports this if you are interested in it let me know and I'll push it somewhere.
Feather.materialize appears to undo any categorical variables. I think it makes sense to materialize into CategoricalArray{ValueType,1,Reftype,...} instead of Array{ValueType}
Performance seems to be lacking. In some informal testing it seems that this branch was 2-2.5x slower than Feather.jl master

quinnj commented 6 years ago

I'm excited about this! let me take some time to do a proper review of your Arrow.jl work, but I like the direction this is headed.

ExpandingMan commented 6 years ago

Thanks all! Please see #80, as well as this and this. I plan on making the changes I mention today.

The random access abilities were the whole reason I bothered doing this in the first place! I envision dealing with 1TB tables relatively easily with DataFrames.jl and DataFramesMeta.jl. Note that you can also use DataFramesMeta.jl or Query.jl (if it's performant enough) combined with Feather as sort of a "database replacement".

ExpandingMan commented 6 years ago

Hello again. Ok, I've made the following changes:

Back to pointers. Safety is no longer guaranteed by safety of Julia. It is however probably a lot less scary than Feather.jl master is now.
In light of the above, performance of Primitive is blazing. It really is doing just about the bare minimum that it can possibly do. Contiguous views are about 30 ns always. List is of course way slower but still pretty good. If we find we are still too slow with these, it would seem that it would have to be a limitation of Julia Mmap or something like that.
Performance of Nullable types is still pretty terrible, but this is just because of the performance of Union types in 0.6. Just doing convert(Vector{Union{T,Missing}}, A) is pretty bad. We've been told this should improve drastically in 0.7, so hopefully we can remedy this then.
DictEncoding now works properly for any Integer reference type.
DictEncoding now will return CategoricalArray by default on any use of materialize and any time it is indexed with :.
Added a bunch of useful materialize methods.
I still have to do a pass to figure out what to do to make sure that only Int32 reference types are ever written. This'll happen by default, but it's still possible to e.g. copy a previously existing DataFrame taht had non Int32 references. Fixing this seems like a surprisingly big pain in the ass, but I think it may be important.

Thanks again for all your feedback. Will do more testing tomorrow!

quinnj commented 6 years ago

I really think we should be targeting 0.7 here; in a few weeks, 0.7 will be in serious release-candidate mode and soon after 0.6 will be a long-thought of the past. 0.7/1.0 will be around for years, so I think all the focus, design decisions, and tradeoffs should be weighted in that direction.

ExpandingMan commented 6 years ago

Yeah, I agree. My doing everything on 0.6 is more a practical matter than anything else. I'll compile 0.7 and try messing around with it a bit.

Keep in mind that the pointers should work just fine in 0.7. reinterpret may or may not, I still have to figure it out. So at least it's not as if the pointers are somehow specialized to 0.6.

I promise that I'll get everything up for 0.7 soon after release candidates are out, if not sooner. Any objections to deprecating 0.6 pretty much immediately (especially since breaking changes will be over until 2.0)?

It would make sense to me if you want to delay merging this until 0.7 is in full swing and I'm fully compatible.

ExpandingMan commented 6 years ago

I fixed a silly bug that was screwing up performance and did a little more performance testing (still in 0.6).

Aha! We are now pretty damn close to Python even in worst case scenario. I loaded a 20 million row dataframe with mostly NullableList (Union{Missing,String}) in about 19.9 seconds including compile time. Python feather and pyarrow took 14.4 seconds on the same task! This is of course still a significant difference but we will probably beat them handily with the Union improvements in 0.7. Pretty satisfying considering all their code is written in C++. Again, right now Primitive takes just about the minimum amount of time it can possibly take.

sglyon commented 6 years ago

Whenever I try using materialize I get bunch of inference errors on the first two attempts, but then it always works on the third... Sorry for the long dump here, but I'm going to post the output so it can be seen. Any ideas what might trigger something like this?

~|⇒ julia                                                                                         vpn-192-168-100-6
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.6.3-pre.0 (2017-12-18 07:11 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 93168a6826 (67 days old release-0.6)
|__/                   |  x86_64-apple-darwin17.3.0

julia> versioninfo()
Julia Version 0.6.3-pre.0
Commit 93168a6826 (2017-12-18 07:11 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin17.3.0)
  CPU: Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, skylake)

julia> using Feather
WARNING: Method definition midpoints(Base.Range{T} where T) in module Base at deprecated.jl:56 overwritten in module StatsBase at /Users/sglyon/.julia/v0.6/StatsBase/src/hist.jl:535.
WARNING: Method definition midpoints(AbstractArray{T, 1} where T) in module Base at deprecated.jl:56 overwritten in module StatsBase at /Users/sglyon/.julia/v0.6/StatsBase/src/hist.jl:533.

julia> df = Feather.materialize("/Users/sglyon/Data/kn_data/2501_2015.feather");
ERROR: TypeError: issubtype: expected Type, got TypeVar
Stacktrace:
 [1] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:2005
 [2] (::Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState})(::Expr) at ./<missing>:0
 [3] next(::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Int64) at ./generator.jl:45
 [4] copy!(::Array{Any,1}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./abstractarray.jl:573
 [5] _collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Core.Inference.HasShape) at ./array.jl:396
 [6] collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./array.jl:393
 [7] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1901
 [8] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [9] (::Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState})(::Expr) at ./<missing>:0
 [10] next(::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Int64) at ./generator.jl:45
 [11] copy!(::Array{Any,1}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./abstractarray.jl:573
 [12] _collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Core.Inference.HasShape) at ./array.jl:396
 [13] collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./array.jl:393
 [14] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1901
 [15] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [16] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
 [17] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
 [18] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
 [19] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
 [20] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
 [21] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
 [22] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [23] (::Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState})(::Expr) at ./<missing>:0
 [24] next(::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Int64) at ./generator.jl:45
 [25] copy!(::Array{Any,1}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./abstractarray.jl:573
 [26] _collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Core.Inference.HasShape) at ./array.jl:396
 [27] collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./array.jl:393
 [28] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1901
 [29] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [30] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
 [31] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
 [32] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
 [33] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
 [34] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
 [35] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
 [36] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [37] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
 [38] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
 [39] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
 [40] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
 [41] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
 [42] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
 [43] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [44] (::Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState})(::Expr) at ./<missing>:0
 [45] next(::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Int64) at ./generator.jl:45
 [46] copy!(::Array{Any,1}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./abstractarray.jl:573
 [47] _collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Core.Inference.HasShape) at ./array.jl:396
 [48] collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./array.jl:393
 [49] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1901
 [50] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [51] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
 [52] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
 [53] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
 [54] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
 [55] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
 [56] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
 [57] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [58] abstract_interpret(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:2087
 [59] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2669
 [60] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
 [61] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
 [62] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
 [63] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
 [64] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
 [65] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [66] abstract_interpret(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:2084
 [67] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2669
 [68] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
 [69] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
 [70] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
 [71] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
 [72] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
 [73] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [74] (::Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState})(::Expr) at ./<missing>:0
 [75] next(::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Int64) at ./generator.jl:45
 [76] copy!(::Array{Any,1}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./abstractarray.jl:573
 [77] _collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Core.Inference.HasShape) at ./array.jl:396
 [78] collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./array.jl:393
 [79] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1901
 [80] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [81] (::Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState})(::Expr) at ./<missing>:0
 [82] next(::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Int64) at ./generator.jl:45
 [83] copy!(::Array{Any,1}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./abstractarray.jl:573
 [84] _collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Core.Inference.HasShape) at ./array.jl:396
 [85] collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./array.jl:393
 [86] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1901
 [87] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [88] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
 [89] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
 [90] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
 [91] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
 [92] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
 [93] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
 [94] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [95] abstract_interpret(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:2076
 [96] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2669
 [97] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
 [98] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
 [99] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
 [100] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
 [101] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
 [102] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [103] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
 [104] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
 [105] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
 [106] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
 [107] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
 [108] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
 [109] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [110] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
 [111] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
 [112] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
 [113] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
 [114] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
 [115] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
 [116] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [117] (::Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState})(::Expr) at ./<missing>:0
 [118] next(::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Int64) at ./generator.jl:45
 [119] copy!(::Array{Any,1}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./abstractarray.jl:573
 [120] _collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Core.Inference.HasShape) at ./array.jl:396
 [121] collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./array.jl:393
 [122] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1901
 [123] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [124] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
 [125] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
 [126] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
 [127] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
 [128] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1922
 [129] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [130] (::Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState})(::Expr) at ./<missing>:0
 [131] next(::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Int64) at ./generator.jl:45
 [132] copy!(::Array{Any,1}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./abstractarray.jl:573
 [133] _collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Core.Inference.HasShape) at ./array.jl:396
 [134] collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./array.jl:393
 [135] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1901
 [136] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [137] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
 [138] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
 [139] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
 [140] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
 [141] abstract_call(::Any, ::Tuple{}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
 [142] abstract_iteration(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1510
 [143] precise_container_type(::Any, ::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1494
 [144] abstract_apply(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1542
 [145] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1689
 [146] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
 [147] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [148] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
 [149] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
 [150] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
 [151] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
 [152] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
 [153] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
 [154] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [155] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
 [156] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
 [157] typeinf_frame(::Core.MethodInstance, ::Bool, ::Bool, ::Core.Inference.InferenceParams) at ./inference.jl:2504
 [158] typeinf_code(::Core.MethodInstance, ::Bool, ::Bool, ::Core.Inference.InferenceParams) at ./inference.jl:2583
 [159] typeinf_ext(::Core.MethodInstance, ::UInt64) at ./inference.jl:2622
 [160] materialize(::String) at /Users/sglyon/.julia/v0.6/Feather/src/source.jl:98

julia> df = Feather.materialize("/Users/sglyon/Data/kn_data/2501_2015.feather");
ERROR: TypeError: issubtype: expected Type, got TypeVar
Stacktrace:
 [1] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:2005
 [2] (::Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState})(::Expr) at ./<missing>:0
 [3] next(::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Int64) at ./generator.jl:45
 [4] copy!(::Array{Any,1}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./abstractarray.jl:573
 [5] _collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Core.Inference.HasShape) at ./array.jl:396
 [6] collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./array.jl:393
 [7] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1901
 [8] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [9] (::Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState})(::Expr) at ./<missing>:0
 [10] next(::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Int64) at ./generator.jl:45
 [11] copy!(::Array{Any,1}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./abstractarray.jl:573
 [12] _collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Core.Inference.HasShape) at ./array.jl:396
 [13] collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./array.jl:393
 [14] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1901
 [15] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [16] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
 [17] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
 [18] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
 [19] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
 [20] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
 [21] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
 [22] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [23] (::Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState})(::Expr) at ./<missing>:0
 [24] next(::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Int64) at ./generator.jl:45
 [25] copy!(::Array{Any,1}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./abstractarray.jl:573
 [26] _collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Core.Inference.HasShape) at ./array.jl:396
 [27] collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./array.jl:393
 [28] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1901
 [29] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [30] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
 [31] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
 [32] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
 [33] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
 [34] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
 [35] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
 [36] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [37] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
 [38] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
 [39] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
 [40] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
 [41] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
 [42] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
 [43] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [44] (::Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState})(::Expr) at ./<missing>:0
 [45] next(::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Int64) at ./generator.jl:45
 [46] copy!(::Array{Any,1}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./abstractarray.jl:573
 [47] _collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Core.Inference.HasShape) at ./array.jl:396
 [48] collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./array.jl:393
 [49] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1901
 [50] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [51] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
 [52] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
 [53] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
 [54] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
 [55] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
 [56] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
 [57] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [58] abstract_interpret(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:2087
 [59] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2669
 [60] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
 [61] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
 [62] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
 [63] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
 [64] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
 [65] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [66] abstract_interpret(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:2084
 [67] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2669
 [68] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
 [69] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
 [70] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
 [71] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
 [72] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
 [73] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [74] (::Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState})(::Expr) at ./<missing>:0
 [75] next(::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Int64) at ./generator.jl:45
 [76] copy!(::Array{Any,1}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./abstractarray.jl:573
 [77] _collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Core.Inference.HasShape) at ./array.jl:396
 [78] collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./array.jl:393
 [79] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1901
 [80] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [81] (::Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState})(::Expr) at ./<missing>:0
 [82] next(::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Int64) at ./generator.jl:45
 [83] copy!(::Array{Any,1}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./abstractarray.jl:573
 [84] _collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Core.Inference.HasShape) at ./array.jl:396
 [85] collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./array.jl:393
 [86] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1901
 [87] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [88] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
 [89] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
 [90] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
 [91] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
 [92] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
 [93] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
 [94] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [95] abstract_interpret(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:2076
 [96] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2669
 [97] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
 [98] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
 [99] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
 [100] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
 [101] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
 [102] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [103] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
 [104] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
 [105] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
 [106] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
 [107] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
 [108] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
 [109] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [110] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
 [111] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
 [112] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
 [113] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
 [114] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
 [115] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
 [116] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [117] (::Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState})(::Expr) at ./<missing>:0
 [118] next(::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Int64) at ./generator.jl:45
 [119] copy!(::Array{Any,1}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./abstractarray.jl:573
 [120] _collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Core.Inference.HasShape) at ./array.jl:396
 [121] collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./array.jl:393
 [122] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1901
 [123] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [124] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
 [125] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
 [126] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
 [127] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
 [128] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1922
 [129] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [130] (::Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState})(::Expr) at ./<missing>:0
 [131] next(::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Int64) at ./generator.jl:45
 [132] copy!(::Array{Any,1}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./abstractarray.jl:573
 [133] _collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}, ::Core.Inference.HasShape) at ./array.jl:396
 [134] collect(::Type{Any}, ::Core.Inference.Generator{Array{Any,1},Core.Inference.##189#190{Array{Any,1},Core.Inference.InferenceState}}) at ./array.jl:393
 [135] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1901
 [136] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [137] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2722
 [138] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
 [139] typeinf_edge(::Method, ::Any, ::SimpleVector, ::Core.Inference.InferenceState) at ./inference.jl:2535
 [140] abstract_call_gf_by_type(::Any, ::Any, ::Core.Inference.InferenceState) at ./inference.jl:1420
 [141] abstract_call(::Any, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1897
 [142] abstract_eval_call(::Expr, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1927
 [143] abstract_eval(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:1950
 [144] abstract_interpret(::Any, ::Array{Any,1}, ::Core.Inference.InferenceState) at ./inference.jl:2076
 [145] typeinf_work(::Core.Inference.InferenceState) at ./inference.jl:2669
 [146] typeinf(::Core.Inference.InferenceState) at ./inference.jl:2787
 [147] typeinf_frame(::Core.MethodInstance, ::Bool, ::Bool, ::Core.Inference.InferenceParams) at ./inference.jl:2504
 [148] typeinf_code(::Core.MethodInstance, ::Bool, ::Bool, ::Core.Inference.InferenceParams) at ./inference.jl:2583
 [149] typeinf_ext(::Core.MethodInstance, ::UInt64) at ./inference.jl:2622
 [150] materialize(::Feather.Source, ::AbstractArray{#s78,1} where #s78<:Integer, ::AbstractArray{T<:(Union{#s79, Symbol} where #s79<:Integer),1}) at /Users/sglyon/.julia/v0.6/Feather/src/source.jl:87
 [151] materialize(::Feather.Source) at /Users/sglyon/.julia/v0.6/Feather/src/source.jl:97
 [152] materialize(::String) at /Users/sglyon/.julia/v0.6/Feather/src/source.jl:98

julia> df = Feather.materialize("/Users/sglyon/Data/kn_data/2501_2015.feather");

julia>

ExpandingMan commented 6 years ago

Yeah, by now I've seen this too. I actually suspect this is a Julia bug, which I'm hoping will go away in 0.7. I want to post a Julia issue or something in discourse, but I've so far failed miserably at creating an MWE. It has something to do with the type parameter for Source which is a big Tuple object, but I think it is really being caused by the same parameter in Data.Schema. @quinnj , have you seen anything like this working with Data.Schema?

For testing purposes, for the time being Feather.materialize(df::DataFrame) works fine after you have loaded a data frame with Feather.read.

ExpandingMan commented 6 years ago

I hope that this PR to Julia fixes this issue in 0.7, but this is really just a guess.

By the way I'm open to changing the name of materialize to load if everyone thinks that's better. materialize is more poetic and I'd like to keep it, but load may require less explanation.

davidanthoff commented 6 years ago

I'd love to see a julia 0.6 version of this. It seems to be mostly ready and usable on 0.6, and so why not make it available for normal users that can't jump on pre-release stuff? I'd love to see 0.7 soon, but if history is any guide, we should probably take the various schedules that are floating around with a grain of salt ;)

sglyon commented 6 years ago

I agree with @davidanthoff here. While the 0.7/1.0 release is upcoming, 0.6 is still the official stable release and I personally would like to see it supported.

Thankfully, things seem to be working fairly smoothly on 0.6 thanks to @ExpandingMan and the 0.7-based decisions/changes are all (I think?) fairly easy to make.

ExpandingMan commented 6 years ago

I'm going to check out on Monday how easy it is to get it working on both. I checked the performance of reinterpret on 0.7 today and it is fantastic, so it is definitely looking like we will be able to have full safety on 0.7. That's actually a pretty big deal because Arrow.jl is extremely dangerous without it. Even though it doesn't expose any pointers to a user, if the user chooses the wrong location in a buffer Arrow will cause buffer overruns (although I kept writing safe so you can't actually write past the end of a buffer). I don't want that thing out there in the wild causing segfaults as Julia users justifiably are not expecting those as a result of improper indexing, but certainly it is unacceptably slow on 0.6 without pointers. (The only alternatives would have been to insert checks everywhere, which may well wind up happening if we support 0.6 for very long.) We have seen Feather.jl master causing segfaults because of unforeseen oddities in Feather files (recent example #76) it would be nice to be able to guarantee that that will never happen again.

I don't know about you, but I'm probably going to feel pretty unconcerned with supporting 0.6 once there actually is a release candidate as obviously 0.7 is rather special.

davidanthoff commented 6 years ago

Can't speak for anyone else, but at least in my groups an alpha (or even a release candidate) will be a non-event. They are trying to get real work done, and so for them it is all stable/released versions of everything (that is painful enough on julia...).

quinnj commented 6 years ago

And in mine we haven't worked on 0.6 for a couple months now; 0.7 is actually very stable and provides a ton of improvements, though it does take a bit of work to get over the "compat" hump from 0.6 -> 0.7.

Of course we can still support 0.6, my comment was intended to convey that we should be targeting 0.7 features/design w/ it's imminent release and providing 0.6 compat as needed, instead of the other way around.

ExpandingMan commented 6 years ago

Well, it sounds like we'll definitely have to get everything working with both, but I agree with @quinnj's sentiment. It is probably a luxury of mainly having done scientific programming, but having lots of legacy stuff sitting around irks me. In Julia's case, I actually think that the symptoms of that problem are far worse than they usually are. I find that overwhelmingly later stuff is more efficient and stable on Julia. I hang out on the Julia discourse a lot and I see so many problems caused by new and inexperienced users winding up with old packages and old versions of Julia.

By the way, my reinterpret bubble was burst when I realized this, so we're not quite there yet.

Anyway yes, we'll get it working on both, don't worry.

ExpandingMan commented 6 years ago

Don't worry, I haven't disappeared. I've been updating some dependencies (notably DataFrames) to 0.7 (with 0.6 compat) so that testing this is a bit cleaner.

ExpandingMan commented 6 years ago

I am more baffled than ever by the crazy type inference error that has been coming up when we use Feather.materialize. See this and this.

Good news is that this PR is looking crazy fast. Note you can still test Feather.materialize by first doing Feather.read and doing Feather.materialize on the resulting dataframe.

Looks like I may have been causing that bug with something that I thought was clever. Let this be a lesson against vanity.

ExpandingMan commented 6 years ago

Ok! Everything seems to be working now on both 0.6 and 0.7, and I have added lots of unit tests in Arrow.jl. Tomorrow I am going to look into if I can prevent that bizarre type inference error. Things also seem noticeably faster in 0.7. We are rapidly approaching a point when this might be merge-able, and performance is looking quite good so far! Benchmarking welcome!

ExpandingMan commented 6 years ago

Travis errors seem due to old DataStreams.jl version. @quinnj , any idea when you might be able to tag a new version?

No idea what is with travis MacOS 0.6.

quinnj commented 6 years ago

Just merged some things on DataStreams master that I want to let settle for a bit; if you wanted to temporarily add Pkg.checkout("DataStreams") to the PR branch here to help test things, I'd actually appreciate seeing if any other DataStreams errors pop up.

ExpandingMan commented 6 years ago

Ok. I've tested locally on DataStreams master, and I have not seen any errors or warnings on either 0.6 or 0.7. Note also that the current tagged version of DataStreams is also causing errors in DataFrames master on 0.7 (which I'm pretty sure are resolved in DataStreams master). Lastly, it looks like a lot of your 0.7 stuff in DataStreams master is now covered by Compat, so that may be worth checking out.

ExpandingMan commented 6 years ago

Good news, I have fixed the bizarre type inference bug that was being triggered when calling Feather.materialize (and also during unit tests of Arrow.jl on 0.6). Also, on the few examples I have looked at so far the read performance is more than twice as fast on 0.7 as on 0.6. Presumably much of this difference comes from better handling of the Union{<:Any,Missing} types. That said, I wouldn't be surprised if there is still a lot of room for improvement for the AbstractList types (strings).

At this point everything seems to be working nicely and I have no further plans to work on Arrow.jl or Feather.jl in the immediate future, except for any changes that would need to be made to get this merged. Please keep me informed about any steps that would need to be taken, assuming there is interest in merging this and not creating a separate package. I know that I will have to tag a release of Arrow.jl and add it to METADATA, but I will wait for feedback if there is any. Thanks all.

(Note: test failures on Travis 0.7 seem to be from out-of-date (tagged) version of DataStreams. Again, no clue what's happening on 0.6 in MacOS or AppVeyor... not sure I've ever once actually seen an appveyor not throw an error...)

(Note: I've actually seen some significant performance regressions on 0.7 in the worst case scenarios. Still, for the most part it seems reasonably fast on 0.6 and 0.7. Writing efficient code is hard, we'll get there.)

ExpandingMan commented 6 years ago

I've just fixed a critical bug.

The Feather format does not support specifying that columns without nulls are "nullable". Therefore one must check that an AbstractVector{Union{T,Missing}} in fact has missings before attempting to write it. If that is not the case, the Feather file will contain a NullableList but the metadata will tell it to attempt to try to build a List. When I get to it I need to add a test case for this. We've lost a little bit of write efficiency, because now AbstractVector{Union{T,Missing}} must always be checked for missing before anything is written.

@quinnj , have you had any time to look at this? I still need to add some unit tests to Arrow.jl before I tag a release, but that will come before next week.

ExpandingMan commented 6 years ago

I've noticed a big problem with Feather.materialize. It's possible, in fact likely, that if you call this function and nothing else all references to the original data buffer will disappear so that it gets garbage collected and you segfault. I don't see any really elegant solution to this as things stand. Certainly I can get rid of the method that only takes a filename as argument, but we'd have to tell users "please keep the Source around somewhere, it can't get gc'd". The only completely reliable alternative I can think of would be to have materialize always do a deepcopy, but ugh, that would be absolutely awful.

My hope was that the pointers would be very temporary anyway, but I still haven't been able to get any news from the core devs on how hard it will be to fix ReinterpretArray. See here. Suggestions, comments welcome!

KristofferC commented 6 years ago

GC.@preserve is the standard way to keep an object alive.

ExpandingMan commented 6 years ago

Thanks, but I've already thought of doing the equivalent and it doesn't solve the problem does it? Then the data will just never ever get gc'd even if the things referring to it go out of scope. This might be really bad if it's a sufficiently big dataset (though I'm very hazy on how exactly memory mapping works, so I don't know if that can help here).

KristofferC commented 6 years ago

The data is preserved for the scope of the preserve block. AFAIU there is no other way to guarantee not getting garbage collected.

ExpandingMan commented 6 years ago

Yeah, in that case we'd have to have users call it manually. Or, I suppose we could create a macro. Will have to think on it further, thanks for your help.

KristofferC commented 6 years ago

The way things "should" work is with the ReinterpretedArray but I know it is non overhead free right now :(

ExpandingMan commented 6 years ago

Yeah I know. I'm learning that pointers are actually much scarier in languages that are not designed to use them. Actually, reinterpret seems fine in some cases but it's just so unpredictable. Any idea what the thinking is on it, whether this will be something that will be easy or difficult to fix? I'd feel a lot better about using it if I thought there was a reasonable expectation of it getting better sometime in the foreseeable future. So far I haven't heard any kind of assessment about what might be wrong and what might be required to fix it (and I'm afraid it's way above my expertise to look into myself).

ExpandingMan commented 6 years ago

Great thanks.

I'm hoping you'll be very pleased with how isolated the pointer code is in Arrow.jl, I tried really hard to make it safe. Of course, there's nothing I can do about users giving the wrong array indices, but I hope that's all that can really go wrong.

Like I said, I'm still contemplating improving the Arrow.jl constructors, so that may change some things here. I also have to update it so that we can use other data types for the offsets, I think we'll need to do Int64 by default.

ExpandingMan commented 6 years ago

Ok, I've just added a whole lot more unit tests, so this damn thing better not have any mysterious uncaught errors anymore. Still not totally sure how that godawful segfault was getting through those tests. Think 0.6 fails for some reason, will fix eventually.

ExpandingMan commented 6 years ago

Absolutely, feel free to make push's to my PR. I've never head push's on a PR before, do I just merge them the same way I would PR's to master?

Also keep in mind that when we merge this we don't have to tag it immediately, so we can always make additional PR's immediately after. We'll have to do some documentation PR's before tagging regardless (I intended to wait until this is merged to do that).

ExpandingMan commented 6 years ago

I've fixed the broken unit test on 0.6. Note that DataStreams will need to be tagged for us to not get test errors on 0.7.

quinnj commented 6 years ago

Thanks @ExpandingMan for all the hard work here; awesome to see all the new arrow/feather functionality progressing.

JuliaData / Feather.jl

Overhauled to Arrow Back-End and Better Memory Safety #78

New Default Reading Behavior

Better Memory Safety

Dropped Support for Some Non-Standard Formatting

Less Dependent on DataStreams

Appropriate Updates to Unit Tests

Codecov Report