JuliaArrays / ShiftedArrays.jl

Lazy shifted arrays for data analysis in Julia
Other
50 stars 10 forks source link

Add `ShiftedArrays.diff` for differences between elements in a vector #51

Open pdeffebach opened 2 years ago

pdeffebach commented 2 years ago

Base.diff does not return a vector of the same size as the original. I find this very frustrating, as I express in #41.

This PR adds ShiftedArrays.diff as a function to get the difference between elements, but pre-pends missing at the first element.

I understand that ShiftedArrays.diff is more characters than x .- lag(x), but it's more declarative.

nalimilan commented 2 years ago

I agree we really need this, but it's not specific to ShiftedArrays at all since it returns a plain Array. Maybe we could add an argument like default to diff in Base to enable this behavior? At least it's probably worth trying.

pdeffebach commented 2 years ago

Filed an issue here

piever commented 2 years ago

I also think it's a bit odd that this should live here. IMO, the idea behind ShiftArrays.circshift versus Base.circshift (or fftshift and co.) is to return a lazy object with the same content. It's the same logic underlying Iterators.map versus Base.map.

Returning an eager result whose value differs from Base.diff is a bit odd, it would be more logical to add options to diff. That being said, I agree that the functionality is useful, I'm just unsure it should be called ShiftedArrays.diff.

nalimilan commented 2 years ago

Ah yeah so you mean it would make sense to have ShiftedArrays.diff return a lazy array instead, with arguments to choose whether the first value should be missing or not?

pdeffebach commented 2 years ago

If it's performant to have the object be lazy, and if it takes advantage of the same infrastructure as lag, then it should definitely live here, right?

@piever do you want to experiment with making this be lazy? Or should I modify the PR to make it return something which looks like what lag returns.

piever commented 2 years ago

Ah yeah so you mean it would make sense to have ShiftedArrays.diff return a lazy array instead, with arguments to choose whether the first value should be missing or not?

I think it'd make sense for it to return a lazy object, as for instance

diff(v, n = 1; default) = Broadcast.broadcasted(-, v, lag(v, n; default))

My main concern was that collect(ShiftedArrays.diff(v)) != diff(v), whereas for all the other functions shadowed here (circshift, fftshit, ifftshift) that holds. Maybe that's not so bad with the mandatory keyword argument. I imagine if diff(v, default=missing) were to be added to Base, it would have the same meaning as the definition above but eager.

pdeffebach commented 2 years ago

Okay, I see your point about consistency with Base. Sorry for missing it earlier. If possible can you show your support in the linked issue? Would be good to coalesce these two discussions into one.