JuliaData / TypedTables.jl

Simple, fast, column-based storage for data analysis in Julia
Other
147 stars 25 forks source link

Differences from StructArrays #106

Open aplavin opened 1 year ago

aplavin commented 1 year ago

The interface seems very similar to a StructArray, so I wonder what are the main differences. Are they highlighted somewhere? A cursory look suggests that a Table and a StructArray are basically drop-in replacements for each other.

sairus7 commented 1 year ago

I'm asking myself the same question. StructArray also has a richer functionality, since it can wrap not only NamedTuples but custom structs as well.

andyferris commented 1 year ago

Historically, TypedTables came just before StructArrays, and they have a lot in common. I'd say TypedTables comes from "dataframes in Julia should just be a strongly typed AbstractVector" and StructArrays implements a struct-of-arrays (SoA) to array-of-structs (AoS) wrapper type, which end up being somewhat equivalent ideas.

StructArray also has a richer functionality, since it can wrap not only NamedTuples but custom structs as well.

One difference that manifests from this viewpoint is in data systems (like, say, SQL) a row (element in a relation) is just a named tuple (and is structurally typed), whereas the SoA-AoS transformation naturally is useful for arbitrary Julia structs (which are nominally typed), which explains this distinction.

aplavin commented 1 year ago

@andyferris is there any functionality in TypedTables.Table not provided by StructArrays? Ie in what circumstances one should use the former?

kpa28-git commented 1 year ago

TypedTables, IndexedTables, and StructArrays seem to have overlap in functionality for the end user. Note IndexedTables stores its data in a StructArray.