Closed schnirz closed 4 years ago
It's worth noting that other constructors don't have overloads for missing
Int(missing)
>ERROR: MethodError: Cannot `convert` an object of type Missings.Missing to an object of type Int64
This may have arisen from a call to the constructor Int64(...),
since type constructors fall back to convert methods.
String(missing)
> ERROR: MethodError: Cannot `convert` an object of type Missings.Missing to an object of type String
This may have arisen from a call to the constructor String(...),
since type constructors fall back to convert methods.
So it is likely that the fix is to add an overload just to the -
function and whatever other unary operators are defined for Dates, but leave the parsing up to the user.
I think that there is a case to be made for overloading Date
and DateTime
constructors.
As an example, consider a user is trying to parse a an array or a column of a data frame containing strings as well as missing values to dates. Asking the user to explicitly handle missing values for this particular case while all other operations handle missing values quietly without the user having to explicitly deal with them would constitute an inconsistent user experience.
Could you clarify which operations handle missing values quietly? For constructors. If, for example, you have an array ["4", "5", missing, "7"]
, you wouldn't have anything work automatically with parsing missings.
Deliberate handling of missings is something Julia makes you more sensitive to than in other languages. For instance this is (probably) recommended behavior.
df2 = @linq df |>
transform(date = [ismissing(x) ? missing : Date(x) for x in :a])
This might be a pain, but there is a lot to be said for this really being a feature because it encourages people to be aware of how missing values propogate.
R constructors do propagate missing values however, so this is new for people coming over from there.
But we should still work on a PR for the Unary operators!
Interestingly, it looks like someone snuck in a bunch of broadcasting behavior that should probably throw errors in 1.0
(-)(x::AbstractArray{T}, y::T) where {T<:TimeType} = x .- y
(-)(y::T, x::AbstractArray{T}) where {T<:TimeType} = y .- x
We might want to learn more about that before a PR to avoid having to write out so many overloadings of unary operators.
Posted on discourse here
I don't think we should have a special case for dates: either all constructors should accept missing
, or none of them. Since it would be weird to have a constructor return an object not of the requested type (as @pdeffebach) noted, I don't think it would be appropriate to accept missing
.
However, we should make it easier to enable functions/constructors to accept missing
and return missing
, an operation which is often called "lifting". See https://github.com/JuliaLang/julia/pull/26661.
Could you clarify more about the functions issue? Should I file a distinct issue for +
and -
with dates, even if we don't touch the constructor? Or should we start idiomatically using lift
?
It seems like a lift
statement everywhere is at odds with the idea of missing
s propagating easily through basic operators.
Yes, +
and -
should (most probably) propagate missing
for all types. Reopening so that we track this part of the original description.
I'm still not 100% convinced that accepting missing
values for the Date
and DateTime
constructors is problematic, since these constructors are used differently from most other constructors (for most types using the parse
function seems to be the way to go from my experience).
I am, however, ok with using a lift
statement/functor. I have, in fact, done so myself recently for dealing with missing
input to DateTime
and it worked just fine. It would seem necessary to provide a generic lift
statement via Base.missing
and to provide some documentation around this approach to explain to new users that this is the way you are supposed to handle missing
values when a function/constructor doesn't explicitly handle them itself.
I also agree with pdeffebach that this approach, while feasible, is at odds with the promises made when missing
was announced as the new de facto missing value type in Julia. As a matter of fact, the need to lift values was levelled as a complaint against Nullable
or Option
types and was presented as a motivation for the new missing
type! (link)
As far as I recall, the argument was that a few basic operations would propagate missing
but for anything other than that one would manually lift.
As far as I recall, the argument was that a few basic operations would propagate missing
Without the convert
and constructor change we might have been able to include parse
and convert
in our "list of basic functions" that propagate missing. However I think that overloading parse
and convert
wouldn't propagate seemlessly now, right?
I think lift is a good idea. Given that missing
is in base, a ?
syntax would be nice.
Without the convert and constructor change we might have been able to include parse and convert in our "list of basic functions" that propagate missing. However I think that overloading parse and convert wouldn't propagate seemlessly now, right?
I don't think the removal of the constructor fallback on convert
changes anything. convert(Union{T, Missing}, x)
should work fine, what we need is a short syntax to write Union{T,Missing}
. Regarding parse
, we could also define methods so that parse(Union{T,Missing}, x)
works, possibly with an argument (keyword?) to specify which string(s) should be considered as missing
. See discussion at https://github.com/JuliaLang/julia/pull/25131 and linked issue.
Thanks for the feedback.
Taking Dates as the example here. Overloading parse
only works If Date()
exclusively calls parse
and convert
, and not _tryparse
or other custom functions. Or we establish best practices where one should always use parse when creating dates. But that seems less than ideal.
But we can't expect to track down every intermediate function that a constructor calls if it isn't just parse
all the way down. We would have to use lift
or something similar.
If lift
is the way to go, then I suppose only the binary operators in Dates
need to be overloaded in order to be consistent with binary operators defined in Base
, correct?
Also, +1 for the ?
syntax for lift
!
Taking Dates as the example here. Overloading parse only works If Date() exclusively calls parse and convert, and not _tryparse or other custom functions. Or we establish best practices where one should always use parse when creating dates. But that seems less than ideal.
But we can't expect to track down every intermediate function that a constructor calls if it isn't just parse all the way down. We would have to use lift or something similar.
Yes, my idea is that people will need to use parse(Union{Date,Missing}, x)
or lift(Date)(x)
(depending on whether missing values are represented as "missing"
or missing
, respectively), not that the Date
constructor would propagate missingness.
I personally prefer lift(Date)(x)
over parse(Union{Date,Missing}, x)
. I have never used parse
for Date
orDateTime
before. Also, the documentation only discusses parsing Date
and DateTime
via their respective constructors, so expecting people to parse Date
and DateTime
using the parse
function when missing values are a concern seems odd.
So are we going to do anything here? Define -
and +
? Or leave things to lifting?
This seems to demand a more general approach to me, rather than adding more and more methods for missing all over the place? When would it end? Never.
As a matter of fact, the need to lift values was levelled as a complaint against
Nullable
orOption
types and was presented as a motivation for the newmissing
type!
This is a slight confusion of terms --- lifting values in the case of Nullable
or Option
types means you can't just use e.g. an Int. You have to use Nullable(2)
instead of 2
. Lifting functions, on the other hand, refers to wrapping a function so that it propagates missing values. The need for that is common to both Nullable/Option and the Missing approaches.
Since +
and -
support missing for other types, it seems reasonable to define them for Dates too.
I agree, once we have decided that a given operator should propagate missing values, we should follow this behavior for all types that implement it. The question of which operators should propagate missing
is a separate one, and indeed something like lift
(https://github.com/JuliaLang/julia/pull/26661) would be useful for those which don't propagate.
This seems to demand a more general approach to me, rather than adding more and more methods for missing all over the place? When would it end? Never.
Take a look at an issue here where I (misguidedly) bring up missings in NaNMath.jl. The package maintainer fairly says that they don't anticipate their package being used in a context where missing values would be a concern.
That is to say, the space of "values that should always propagate missings" might be quite small. But of course means we would inevitably have gotchas where we have to say "No, that type doesn't support missings, use lift." The inconsistency would be tiring.
AFAICT NaNMath is a different question, as NaNMath.mean
is a completely separate function from Statistics.mean
. Better discuss that in the relevant issue.
I've just rather unexpectedly run into the lack of methods for +
and -
involving TimeType
.
Is there any consensus around this? I can make a PR if everyone agrees to support it for +
and -
. I'm certainly open to arguments about why we should not, but as the lack of these methods came as a surprise to me today, I suspect they'll come as a surprise to many other people as well.
@ExpandingMan I think there are some good methods using + or -. Only you need to make whatever type you are adding is a date type such as now()+Day(1)
. I think the reasoning is sound since you might want to add a second or a minute to the date/time. Whether it should be implied what the user is trying to do when you have a date object plus an integer might be a reasonable suggestion. Date(now()) + 1
Of course it is easy to add such methods: Base.:+(x::Date, y::Integer) = x + Day(y)
I'm in the same boat as @ExpandingMan here - I've actually been using passmissing(i)(x, y)
for a while, but it seems uncontroversial (and largely agreed in this thread) that Date(2020, 8, 26) + missing
should return missing
?
Are there any reasons not to add this?
Currently, there is no support for
missing
values when it comes to working withDate
andDateTime
.For example, the
Date
constructor throws an error when called on amissing
input value:Similarly, computing the difference between two dates fails of one of them is missing:
Presumably, all unary and binary operators need to be overloaded, either in
Base
or inDates
. Any ideas and suggestions for how to address this issue would be more than welcome.