Closed LukeWeidenwalker closed 1 year ago
@soxofaan @dthiex Do you know how your implementations handle this?
we currently use this implementation from the legacy openeo-processes-python repo: https://github.com/Open-EO/openeo-processes-python/blob/f57c663e8708061a8222851a8aec2762ac46b0bb/src/openeo_processes/arrays.py#L745-L817
which behaves like the current spec if I understand correctly:
import numpy as np
data = np.array([1, 2, 2, 3])
print(data)
# ---> [1 2 2 3]
# Ascending
print(numpy.argsort(data, kind="mergesort"))
# ---> [0 1 2 3]
# Descending
print(numpy.argsort(-data, kind="mergesort"))
# ---> [3 1 2 0]
The simplest implementation to support the asc parameter is to simply flip the input array before running argsort if descending order is desired. However, the order process currently requires that Ties will be left in their original ordering, regardless of whether asc=True or asc=False. This means that we cannot simply flip the input array, because ties would be flipped too.
maybe I'm misunderstanding, but if you mean flipping the sign of the input data with "flipping the array", I think the descending mode works just fine with numpy.argsort
as illustrated above
Apologies, typo in my description:
I mean after applying argsort
, flipping (aka reversing) the array with the ordered indices with np.flip
to get the opposite ordering.
Flipping the sign only works on numerical arrays, but will fail on string/datetime arrays.
If it works with argsort in the old implementation, why do we need a change here? Can you give some examples, @LukeWeidenwalker?
Adding the minus-sign like in the old implementation only works on numerical arrays:
-data
You cannot do this on e.g. string arrays, it will error out. For those other arrays, you can still get the sorting indices with argsort
and flip that array if you want it descending. However, then the order of ties is flipped too - hence this issue!
Depending on your answer to this comment: Other than changing this in the spec, we can of course just not support anything other than numerical arrays in our order
process and throw descriptive errors on why this isn't possible. Don't think that's the best option for users though, because that will only be reportable at runtime.
Order allows number|null|date-time:string|date:string|time:string
- All the strings are temporal, so I'd assume internally they could be timestamps (i.e. numerical)? Can you use the -
thing on some kind of date/time objects in Python?
Alternatively, just remove the strings from the schema (or whatever you don't support). This is reasonable, I think.
Do you know how your implementations handle this?
In our backend we only support number, null or a valid ISO date string as inputs. We then use the javascript sort function which handles ties as described in the process documentation (= Stefaan's example).
Order allows
number|null|date-time:string|date:string|time:string
- All the strings are temporal, so I'd assume internally they could be timestamps (i.e. numerical)? Can you use the-
thing on some kind of date/time objects in Python?
No, just tried again, this doesn't work!
we currently use this implementation from the legacy openeo-processes-python repo
@soxofaan so how does your process handle datetime strings then?
with strings we don't do date interpretation, but that's usually not a problem with ISO8601
A string array is eventually passed to np.argsort
and:
[500] Internal: Server error: TypeError("bad operand type for unary -: 'str'")
A mixed array (strings and numbers) also fails in ascending mode with:
[500] Internal: Server error: TypeError("'<' not supported between instances of 'str' and 'int'")
I don't think we ever had someone complaining about this and requiring proper string support for the order
process, so it's not high on our priority list currently
It seems it's hard to really come to a conclusion here with the variability of implementations. There are a couple of options:
false
. This is likely the most interoperable (but slowest) solution as it's a two-step approach that I assume everyone could follow.The simpliest part would probably be 5, but this would probably lead to a change in SH, right?
with strings we don't do date interpretation, but that's usually not a problem with ISO8601
@soxofaan fyi: The spec says: "Temporal strings can not be compared based on their string representation due to the time zone/time-offset representations."
with strings we don't do date interpretation, but that's usually not a problem with ISO8601 @soxofaan fyi: The spec says: "Temporal strings can not be compared based on their string representation due to the time zone/time-offset representations."
yes I know, I just mean that is practically not a problem in VITO backend because we usually only work at date resolution (and if there is a time component we'd always use UTC)
Proposal: Replace "Ties will be left in their original ordering." with "The ordering of ties is implementation-dependent."
Process ID: order
Describe the issue: The simplest implementation to support the
asc
parameter is to simply flip the input array before running argsort if descending order is desired. However, theorder
process currently requires thatTies will be left in their original ordering
, regardless of whetherasc=True
orasc=False
. This means that we cannot simply flip the ~input array~ array with the ordered indices, because ties would be flipped too. This makes the implementation more unwieldy than it has to be.Proposed solution: Change this requirement such that ties will be in the original ordering if asc=True and flipped otherwise.
Additional context: See here for initial discussion