Open ablaom opened 3 years ago
Of course, this comment does not address the other "fly in the ointment" which is tables, requiring trait dispatch.
I'm probably not getting this suggestion 100% in terms of how it is different from what is there now sorry, could you maybe show what it would give in the case of Sampleable{S}
before and after your suggestion?
I have also been thinking down this line and I like this idea. It makes the codebase simpler.
So I guess this would imply that scitype(Any[1,2,3])
should return Array{Unknown, 1}
since Scitype
would operate on the type information.
This is would be more efficient than the current implementation which peeks into the type of each element in the array at runtime.
Well, no, I'm not suggesting we change the definition of scitype for arrays (see Property 3) - only the implementation. According to the definition, we will need to look inside to determine the scitype in the case of Any
eltype, and any other eltype
for which Scitype(T)
returns Unknown
.
We could change the definition, but this would pose some problems. For example, if scitype(Any[1.0, 2.0, 3.0]) = AbstractVector{Unknown}
, then it forces users to tighten their element types when this is not strictly necessary to use the algorithms (an objection raised by @tlienart, if I remember correctly). Also, what should the new definition be that still allows us to recover scitype(categorical(1, 2, 3)) = AbstractVector{Multiclass{3}}
?
Perhaps you have a simple replacement for Property 3 that allows us to get everything we want?
I have often lamented the fact that
scitype
cannot be a map of machine type to type, instead of object to type, because of the infamousCategoricalValue
fly in the ointment. As a workaround to performance problems with arrays, we introducedScitype
, which is a map from type to type. Wouldn't if be simpler if implementingScitype
is the "fallback" responsibility of a convention, and that we only overloadscitype
for problematic cases likeCategoricalValue
? What am I missing here?So, something like (ignoring convention distinctions):
To be clear, I'm not suggesting a change in the definition of
scitype
, only how it is implemented, althoughScitype
is something we may want to make part of the public interface.What got me thinking about this is the case of parametric types like
Sampleable{S}
and a type I'd like to introduce, calledIterator{S}
for lazy loaded data structures. HereS
is the scitype of the objects sampled, or the scitype of the objects iterated, respectively. How to we implement scitypes for these? This is tricky because we may not have an object from which to extract the parameterS
, only its machine type. So in this case we are limited to usingScitype
.Thoughts @OkonSamuel @tlienart