JuliaData / TypedTables.jl

Simple, fast, column-based storage for data analysis in Julia
Other
145 stars 25 forks source link

`Base.getindex` over allocating: use generator for getindex #103

Closed m-wells closed 1 year ago

m-wells commented 1 year ago

I came across this while developing my package LazyTables.jl

This should greatly improve performance of TypedTables (although for complex row by row operations LazyTables should still outperform).

As you can see for a simply indexing operation TypedTables makes 24 allocations when it only needs to make one.

(@v1.8) pkg> free TypedTables
   Resolving package versions...
    Updating `~/.julia/environments/v1.8/Project.toml`
  [9d95f2ec] ~ TypedTables v1.4.1 `~/.julia/dev/TypedTables` ⇒ v1.4.1
    Updating `~/.julia/environments/v1.8/Manifest.toml`
  [9d95f2ec] ~ TypedTables v1.4.1 `~/.julia/dev/TypedTables` ⇒ v1.4.1

julia> using TypedTables
[ Info: Precompiling TypedTables [9d95f2ec-7b3d-5a63-8d20-e2491e220bb9]
WARNING: method definition for broadcasted at /home/mark/.julia/packages/TypedTables/dycVq/src/DictTable.jl:301 declares type variable N but does not use it.
WARNING: method definition for broadcasted at /home/mark/.julia/packages/TypedTables/dycVq/src/DictTable.jl:305 declares type variable N but does not use it.
WARNING: method definition for broadcasted at /home/mark/.julia/packages/TypedTables/dycVq/src/DictTable.jl:309 declares type variable N but does not use it.
WARNING: method definition for broadcasted at /home/mark/.julia/packages/TypedTables/dycVq/src/DictTable.jl:313 declares type variable N but does not use it.
WARNING: method definition for broadcasted at /home/mark/.julia/packages/TypedTables/dycVq/src/DictTable.jl:317 declares type variable N but does not use it.

julia> table = Table(NamedTuple(Symbol(s) => rand(rand((Bool, Float32, Int, Float64)), 1000) for s in Char.(97:122)));

julia> table[3]; @time table[3];
  0.000011 seconds (24 allocations: 1.250 KiB)
(@v1.8) pkg> dev TypedTables
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.8/Project.toml`
  No Changes to `~/.julia/environments/v1.8/Manifest.toml`

julia> using TypedTables

julia> table = Table(NamedTuple(Symbol(s) => rand(rand((Bool, Float32, Int, Float64)), 1000) for s in Char.(97:122)));

julia> table[3]; @time table[3];
  0.000004 seconds (1 allocation: 144 bytes)
andyferris commented 1 year ago

Thanks @m-wells.

This is cool. I haven't done performance work on this package in a long while. I wonder if the generator would have worked efficiently all the way back in Julia 1.0? Have you confirmed it works OK for large numbers of columns and/or with large number of union-type columns (like Union{T, Missing})

codecov[bot] commented 1 year ago

Codecov Report

Base: 69.41% // Head: 69.08% // Decreases project coverage by -0.33% :warning:

Coverage data is based on head (ace9728) compared to base (0b86a6a). Patch coverage: 50.00% of modified lines in pull request are covered.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #103 +/- ## ========================================== - Coverage 69.41% 69.08% -0.34% ========================================== Files 7 7 Lines 909 909 ========================================== - Hits 631 628 -3 - Misses 278 281 +3 ``` | [Impacted Files](https://codecov.io/gh/JuliaData/TypedTables.jl/pull/103?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaData) | Coverage Δ | | |---|---|---| | [src/Table.jl](https://codecov.io/gh/JuliaData/TypedTables.jl/pull/103/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaData#diff-c3JjL1RhYmxlLmps) | `73.41% <50.00%> (ø)` | | | [src/TypedTables.jl](https://codecov.io/gh/JuliaData/TypedTables.jl/pull/103/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaData#diff-c3JjL1R5cGVkVGFibGVzLmps) | `90.00% <0.00%> (-10.00%)` | :arrow_down: | | [src/properties.jl](https://codecov.io/gh/JuliaData/TypedTables.jl/pull/103/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaData#diff-c3JjL3Byb3BlcnRpZXMuamw=) | `78.19% <0.00%> (-0.76%)` | :arrow_down: | | [src/DictTable.jl](https://codecov.io/gh/JuliaData/TypedTables.jl/pull/103/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaData#diff-c3JjL0RpY3RUYWJsZS5qbA==) | `57.28% <0.00%> (-0.51%)` | :arrow_down: | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaData). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaData)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

m-wells commented 1 year ago

Have you confirmed it works OK for large numbers of columns and/or with large number of union-type columns (like Union{T, Missing})

I have not done extensive performance analysis on union-type columns.