JuliaArrays / ArraysOfArrays.jl

Efficient storage and handling of nested arrays in Julia
Other
43 stars 9 forks source link

Incorrect/unnecessary overloads #11

Closed colinxs closed 4 years ago

colinxs commented 4 years ago

(copied from Slack)

Here's one example:

Statistics.std(X::AbstractVectorOfSimilarArrays{T,M}; corrected::Bool = true) where {T,M} =
     std(flatview(X); corrected = corrected)
which results in the following behavior:
 julia> A=rand(2,3,4);

 julia> B=nestedview(A, Val(2));

 julia> C=[Array(el) for el in B];

 julia> std(B)
 0.32135578668613923

 julia> std(C)
 2×3 Array{Float64,2}:
  0.163392  0.393621  0.343109
  0.290079  0.365516  0.135774

I think the fix for this one is easy: just remove all the overloads for now. If we continue with the plan to merge SpecialArrays/JuliennedArrays here then how dims is handled changes anyways since they both support slicing along arbitrary dimensions and not just the first M.

oschulz commented 4 years ago

just remove all the overloads for now

Please don't, I have packages that critically depend on these things being optimized for VectorOfSimilarVectors (it's doing the right thing there).

oschulz commented 4 years ago

However, I'd be fine with falling back to the default implementation for anything that can't be easily handled by running the stats functions on the flat view.

colinxs commented 4 years ago

I should've clarified, I meant just removing the incorrect overloads for now, rather than patching them, as the proper fix will change things anyways :)

oschulz commented 4 years ago

So, the whole thing was a bit of a mess. I've cleaned this up now, have a look at #12.

I think I need to submit a few PRs to StatsBase, too, at some point, a lot of things that Statistics supports for vectors-of-vectors StatsBase does not (or incorrectly). See comments in new tests.

oschulz commented 4 years ago

Closing this for now - I believe #12 now ensure same semantics as standard nested arrays for statistics functions (except where not defined for standard arrays). We can always revisit this, of course.

oschulz commented 4 years ago

For future reference, this is also related to JuliaStats/StatsBase.jl#518.