Open scheidan opened 6 years ago
I'm not sure if it is related to #58, but it definitely is something I wasn't aware of an that needs some attention. The type stability issue still hasn't been resolved, it is a pretty tough one.
It looks like the culprit is similar(n::NamedArray)
, which has to build the index dicts. It doesn't know the key types in advance, since the dimensions of the similar array may be different from n
, in which the default type (String) is to be generated.
In that sense it is related to #58, because the current declaration
dicts = Array{OrderedDict{Any,Int}}(nd)
helps to fix #58, but causes this #60.
So far the conclusion is that I am not a very good type stability programmer.
It should be much easier to fix than #58, though. You can just define a Base.similar{T,N}(n::NamedArray{T,N}, t::Type)
method which always uses the same dimension types.
Thanks, that idea crossed my mind, too. That is probably easiest.
The zeros
and ones
, are fixed now, but cat
c.s. is non-trivial. Consider
julia> n = NamedArrays.NamedArray(rand(3), [DataStructures.OrderedDict(:a => 1, :b =>2, :c =>3)])
3-element Named Array{Float64,1}
A │
───┼──────────
:a │ 0.0794262
:b │ 0.28769
:c │ 0.123761
Both hcat(n, n)
and vcat(n, n)
have to re-invent names. This make the keytype go from Symbol
to String
in the case of vcat
. Both situations break #58.
What do you mean by "reinvent names"? Why couldn't Symbol
names be created?
So currently, a vcat(n, n)
generates names "1"
..."6"
for dim A
in the case above. Of course in this particular case they could have been named Symbol("1")
etc., to keep the keytype the same, but I don't really see the point there. The keys along the dimension that is concatenated can be of any type, and only in the case the names of the constituent arrays don't intersect the types and names could carry on to the concatenated array. Currently there is no check for that case---but I don't even want to start imagining the trouble needed to do that checking and joining in typestable code.
I would have expected the names generated by vcat(n, n)
to be e.g. [:a, :b, :c, :a1, :b1, :c1]
or something similar. Anyway the type should be preserved, and if types differ promote
should be used to choose a common type.
BTW, when writing inferrable code is too hard, generated function can help, since they allow doing whatever complex operations you need to compute manually the return type based on the input types, and then the compiler doesn't have to guess that type.
The original names can be anything, not just symbol or string, so I don't see how a general sensible naming scheme can be devised. promote_type(String, Symbol)
is Any
, I can't imagine this is going to help a lot.
Yes, Any
would be used in the worst case, but then you can't do better anyway and users should take care of using compatible names (at least if they care about performance).
Then indeed for custom name types it might not be possible to create unique names as for strings, but that doesn't sound like a sufficient reason to avoid reusing names when possible: better optimize for the most common case where either names are unique, or they are strings and can be made unique. Maybe just throw an error when duplicated names are found and they are neither strings nor symbols?
A very useful package, thanks a lot for your work!
Some functions seem to change the type of the keys of the name dictionaries to
Any
. For example:Maybe it is related to #58 but I hope it is still useful to report.
As workaround for
zeros
andones
I use currently this: