JuliaData / DataAPI.jl

A data-focused namespace for packages to share functions
Other
33 stars 13 forks source link

Don't define unwrap(x::Any) #59

Open jariji opened 1 year ago

jariji commented 1 year ago

If someone has a special need for a function that disregards types, they can make their own wrapper function. The function in DataAPI that's the public function for getting values out of CategoricalValues should only take CategoricalValue. It is too easy to accidentally call unwrap on a type that is not CategoricalValue, such as Vector{CategoricalValue}.

See also https://github.com/JuliaData/CategoricalArrays.jl/issues/399 This change is breaking so should be considered for DataAPI 2.0.

bkamins commented 1 year ago

Summarizing the discussion on Slack, we should decide if we should drop the default method definition of unwrap and just keep the function definition.

nalimilan commented 1 year ago

In what situations can it be a problem to call unwrap on a value which isn't a CategoricalValue or another "wrapper"? At worst I would imagine that you get an error a bit later.

bkamins commented 1 year ago

Note that we need unwrap(::Missing) in general also. The only problem I can see is the one that you mention: when you call unwrap on a value of a wrong type and do not get an error.

However, the idea of unwrap was that when you call unwrap.(collection) the collection should allow values of mixed types and not error.

In practice the only issue I can see it the OP problem when you call unwrap(collection) instead of unwrap.(collection).

jariji commented 1 year ago

I don't have a specific code example where this would do the wrong thing. I'm making an argument on principle.

I don't understand when it would be okay to call a function on arbitrary values with unknown semantics. This may have some "pragmatic" use case in plotting or whatever but I just don't think it makes sense.

This is one of those functions like something(::Any) that drops a layer of wrapping whether or not it was intended, which is only useful when you've already failed to distinguish between success and failure.

nalimilan commented 1 year ago

I see your point, and maybe if we had to do it again we would avoid adding the fallback method. But now that it's there I'm reluctant to add value to CategoricalArrays just because of principles. Maybe we can drop the fallback in DataAPI 2.0.

I suspect @quinnj defined unwrap(::Any) at https://github.com/JuliaData/DataAPI.jl/pull/35 because the previous method defined in Tables.jl had to work both on DataValue and on any other kind of value. That's not great and ideally we would get rid of DataValue. What do you think @quinnj?

quinnj commented 1 year ago

I think @jariji makes fair points, but I would just say in response that this was only ever intended as a very "pragmatic" sort of interface, and not a very principled one, so that's why things are the way they are. I don't think it's worth getting rid of DataValue support for slightly better principled APIs. I think it's probably more practical that we rename it to something more specific so people don't get so hung up on its current behavior.

But it was definitely an API (like most in DataAPI) born out of necessity/solving a specific issue rather than some over-arching CS concept or something.

aplavin commented 1 year ago

Potentially relevant - another case of a fallback that is "too generic": https://github.com/JuliaData/DataAPI.jl/issues/54