JuliaArrays / AxisArrays.jl

Performant arrays where each dimension can have a named axis with values
http://JuliaArrays.github.io/AxisArrays.jl/latest/
Other
200 stars 42 forks source link

Roadmap #7

Open mbauman opened 9 years ago

mbauman commented 9 years ago

We're getting to the point where the core API is stabilizing. We can now start making things fancier and building out interfaces to base methods and other packages (like #5). Here's a current brain-dump of some of my thoughts. Additions, critiques and comments are very welcome.

Remaining core infrastructure

tshort commented 9 years ago

Great ideas, Matt!

timholy commented 8 years ago

Allow (or maybe even encourage) the use of an ordered hashmap for categorical axes to enable O(1) index lookup.

Julia could definitely use a perfect-hashing "dictionary" type. If anyone tackles this, please make it a standalone package rather than burying it in some other package. There would be many users.

phaverty commented 7 years ago

Allow (or maybe even encourage) the use of an ordered hashmap for categorical axes to enable O(1) index lookup. NamedArrays now uses an OrderedDict for its axis names. Profiling showed that when subsetting such a NamedArray, much of the time went into making a new OrderedDict for the new NamedArray. I have a PR (#211) over at DataStructures that speeds this up quite a bit, but this PR is on hold until I can make it backwards compatible with julia 0.4. (Any suggestions would be most welcome.)

gajomi commented 5 years ago

Allow (or maybe even encourage) the use of an ordered hashmap for categorical axes to enable O(1) index lookup.

+1

I'm sure the linear search will outperform hashing for small N (particularly with symbols), but what's the cutoff? 10? 100? What about strings? Chances are that folks won't be using categorical vectors to enumerate more than 100 elements.

FWIW in biology it is not uncommon to talk about categories of thousands of species, tens of thousands of gene families. In medicine O(10^5) identifiers for diagnoses, similar for patients.

nickrobinson251 commented 5 years ago

See also work on new packages inspired by AxisArrays https://github.com/JuliaCollections/AxisArraysFuture/issues/1