ericpan64 / pydian

Python framework for developer-friendly data interchange
MIT License
2 stars 0 forks source link

DataFrame module updates (DSL improvements) #9

Open ericpan64 opened 3 weeks ago

ericpan64 commented 3 weeks ago

(original title: Think of nicer syntax for join and union DataFrame operations)

Problem

Right now, select and group_by might be useful, but join and union don't really do much (i.e. it's just a wrapper to the base API). So feels weird if it doesn't also provide some ergonomic string syntax

Requested feature

For join: maybe can do something like:

join([first, second], "A <- B") # inner left-join, either use symbols `A...Z` in-order, or by variable name
join([first, second], "A <+ B") # outer left-join
join([first, second], "A <> B") # inner join (is this ambiguous?)
... # etc.

For union: maybe can make it easier to append rows formatted in different ways (e.g. as labeled dict, as tuples, with default values in some sparse way, etc.)

Alternatives considered

-

Additional context

-

ericpan64 commented 2 weeks ago

Another idea (grouping here, though might be worth splitting out to another issue): why not just have everything be encapsulated in the select DSL? E.g. "(a, b).count()" -- to capture the idea of groupby, and then a set of supported operations. So basically get more creative with the DSL string

ericpan64 commented 1 day ago

Some other ideas: