Closed ericpan64 closed 2 months ago
Note: the API is mostly nouns and adjectives, since result is a DataFrame (thinking in terms of results). This will also help split-out mental model from SQL. So similar declarative style, with thinking towards declarative properties
E.g. consider selection
over select
rename
, etc. on this function (i.e. will give a purpose for selection(df, "*")
...)Relational-algebra-like API interface to consider:
union
: append rows, add null colsdifference
: "left antijoin"cross_product
: for two dataframes, create tuple-combinations (a,b
x c,d
-> (a,c), (a,d), (b,c), (b,d)
)complementary
: Relational algebra "division". Given the full_db (full) and a single column of values (val_col), grabs all matches where a complement row (full-val_col) has a matching row for all values in val_colprojection
: This can be the "filter" function, i.e. only keep rows following the projection property
sample
: get a random sample / split of the dataNow more SQL-like conventions
inner_join
outer_join
: a left joinanti_join
: opposite of inner_join, returns two different dataframesgrouped_by
: Aggregator function -- returns another dataframe (use groupby objects under the hood)
Problem
Requested feature
Do a pass and see if concepts can make things more usable!
https://en.wikipedia.org/wiki/Relational_algebra#Introduction
Alternatives considered
-
Additional context
-