JuliaData / DataFrames.jl

In-memory tabular data in Julia
https://dataframes.juliadata.org/stable/
Other
1.71k stars 360 forks source link

Fix eachrow and eachcol indexing with CartesianIndex #3413

Closed bkamins closed 5 months ago

bkamins commented 5 months ago

Fixes https://github.com/JuliaData/DataFrames.jl/issues/3412

@nalimilan - I add it to 1.7 release as probably there is still some time till Julia 1.11 release.

Having said that - I propose we go back to the discussion on which open PRs from https://github.com/JuliaData/DataFrames.jl/pulls we want to merge and make 1.7 release of DataFrames.jl.

The bug is uncovered due to https://github.com/JuliaLang/julia/pull/52736, which starts using CartesianIndexing in broadcasting BitArray.

nalimilan commented 5 months ago

Changes look good but I'm surprised that they are necessary, as the AbstractArray API doesn't require supporting CartesianIndex. Isn't there a bug in https://github.com/JuliaLang/julia/pull/52736? Or is this due to how DataFrames.jl implements broadcasting?

bkamins commented 5 months ago

These were also my thoughts, but having read https://docs.julialang.org/en/v1/manual/interfaces/#Working-with-Broadcasted-objects I concluded that broadcasting explicitly requires objects to support CartesianIndex directly (although getindex API does not require it). Quoting:

Iterating over the CartesianIndices of the axes(::Broadcasted) and using indexing with the resulting CartesianIndex object to compute the result.

And this is what https://github.com/JuliaLang/julia/pull/52736 does (i.e. uses CartesianIndexing internally).

@mbauman - could you please confirm that this is a correct thinking?

The issue we currently have is that without the extra definition when we are passed a CartesianIndex when indexing we fall back to:

Base.@propagate_inbounds Base.getindex(itr::DataFrameRows, idx) =
    eachrow(@view parent(itr)[idx isa AbstractVector && !(eltype(idx) <: Bool) ? copy(idx) : idx, :])

which tries something like df[CartesianIndex(1), :] which is invalid.

nalimilan commented 5 months ago

Ah OK so the failure is "our fault" as the the DataFrameRows method takes precedence over the fallback AbstractArray method.

bkamins commented 5 months ago

Thank you!