JuliaArrays / ShiftedArrays.jl

Lazy shifted arrays for data analysis in Julia
Other
50 stars 10 forks source link

Out of bound #39

Closed matthieugomez closed 3 years ago

matthieugomez commented 4 years ago

Indexing a ShifedArray by an out of bound index should probably throw a BoundsError (rather than returning a missing value).

lag([1])[-1]
#> missing
roflmaostc commented 3 years ago

We also encountered the issue with PaddedViews. Wouldn't it be much more intuitive to throw an error? It is also confusing if the user does a false indexing and receives a wrong result rather than an error? The critical part is probably this line

We should check x if it is inbounds and only then, do the index shifting procedure, right?

Thanks!

RainerHeintzmann commented 3 years ago

To be more precise: accessing a ShiftedArray.circshift array in a place outside the limits of the original size area returns a circularly shifted value independent of the aktual index. This may seem like a good ideal at first but breaks compatibility with PaddedViews which relies on out of bound errors to fill in padding values. For non circular shifts this may be ok as the total size is not altered but for circshift I think index checking on the original size may be a good idea. I tested this in an implementation and it seems to work quite well. Happy to put in a pull request if agreed. As an alternative one could add a keepsize=true argument.

roflmaostc commented 3 years ago

@piever I hope you don't mind pinging :smile:, but I believe it would be worth to discuss a few design (bugs?) decisions:

julia> using PaddedViews, ShiftedArrays

julia> x = [1,2,3,4,5]
5-element Vector{Int64}:
 1
 2
 3
 4
 5

julia> x_s = ShiftedArrays.circshift(x, (2))
5-element CircShiftedVector{Int64, Vector{Int64}}:
 4
 5
 1
 2
 3

julia> x_s[1000000]
3

In my point of view ShiftedArrays.circshift should behave like circshift but with a view, in principle that works. But adding the property that the content is being repeated is really dealbreaker when you want to use ShiftedArrays.jl together with PaddedViews.jl:

julia> PaddedView(42, circshift(x, 2), (10, ))
10-element PaddedView(42, ::Vector{Int64}, (Base.OneTo(10),)) with eltype Int64:
  4
  5
  1
  2
  3
 42
 42
 42
 42
 42

julia> PaddedView(42, ShiftedArrays.circshift(x, 2), (10, ))
10-element PaddedView(42, ::CircShiftedVector{Int64, Vector{Int64}}, (Base.OneTo(10),)) with eltype Int64:
 4
 5
 1
 2
 3
 4
 5
 1
 2
 3

We already have a fork with a fixed version and we would be happy to contribute code and prepare a PR changing that. However, we also see the issue that this might break already existing code which relies on that feature We believe, it would be nice to move the current functionality to something like circshift_replicate which reflects more the meaning of the operation.

What do you think about that? Are you actively using the package currently? We appreciate the effort that has been invested into the package and we believe the easiest (at least for us :laughing:) would be fixing this here instead of creating a new package.

piever commented 3 years ago

No problem about the ping!

I confess I haven't been using this package for a while. The "out of bound" behavior made sense for a usecase I had at the time, but I'd tend to agree that one should rather use OffsetArrays or CircularArrays to use out-of-bound indices. Forbidding out-of-bound indices may also optimize some of the code for CircShiftedArrays, because I imagine this while loop could be replaced by an if statement, which should have less overhead.

I'll be happy to review a PR that proposes this change, but I guess the PR should stay open for a little bit to give time to other JuliaArrays members to give their opinion, because the change is breaking.

RainerHeintzmann commented 3 years ago

The bringwithin is best represented by mod(idx,range) so it does not even need an if statement. Thanks a lot for your positive feeling about a pull request. I changed the index checking in both, circshiftedarray and shiftedarray to reinforce the array boundaries. Then the behaviour is as I would expect:

b=PaddedView(0,CircShiftedArray(reshape(1:25,(5,5)),(1,1)),(9,9),(2,2))
9×9 PaddedView(0, OffsetArray(::CircShiftedArray{Int64, 2, Base.ReshapedArray{Int64, 2, UnitRange{Int64}, Tuple{}}}, 2:6, 2:6), (Base.OneTo(9), Base.OneTo(9))) with eltype Int64:
 0   0  0   0   0   0  0  0  0
 0  25  5  10  15  20  0  0  0
 0  21  1   6  11  16  0  0  0
 0  22  2   7  12  17  0  0  0
 0  23  3   8  13  18  0  0  0
 0  24  4   9  14  19  0  0  0
 0   0  0   0   0   0  0  0  0
 0   0  0   0   0   0  0  0  0
 0   0  0   0   0   0  0  0  0

I will put in the pull request now.

piever commented 3 years ago

Fixed by #44