JuliaStats / StatsBase.jl

Basic statistics for Julia
Other
584 stars 194 forks source link

A more performant countmap #311

Closed xiaodaigh closed 6 years ago

xiaodaigh commented 7 years ago

A faster implementation of countmap exists in FastGroupBy.jl in the form of sumby that uses Radixsort instead of a dictionary method. It can be made to return a dictionary though. Happy to make a PR that require no dependencies.

See this companion post for the research backing the faster claim.

nalimilan commented 7 years ago

A faster implementation would certainly be welcome if it's as general as the existing one. Note that radix sort won't work for all types, though, and it will even return incorrect results if some isbits types implement a custom ordering which is inconsistent with the bit patterns. So it should probably only be used for types known to be safe.

See also the implementation in FreqTables.

xiaodaigh commented 7 years ago

It won't be as general as the current one. It will simply be countmaps for isbits types and for other types it will still use the existing Dict based method

nalimilan commented 7 years ago

Sure, feel free to make a PR which would use a faster approach for types known to be safe.

xiaodaigh commented 6 years ago

319 is trying to solve this one