blaze / blaze

NumPy and Pandas interface to Big Data
blaze.pydata.org
BSD 3-Clause "New" or "Revised" License
3.19k stars 391 forks source link

Case Expressions #1138

Open cpcloud opened 9 years ago

cpcloud commented 9 years ago

Recap from the mailing list:

It'd be nice to be able to select values in columns based on values in other columns. SQL does this with case expressions:

select case
    when a == 1 then b + 1
    when c == 2 then a
    else a
end
from table

SQLAlchemy implements this as well: http://docs.sqlalchemy.org/en/latest/core/sqlelement.html#sqlalchemy.sql.expression.case

NumPy has a special case of this (pun intended), namely np.where.

I propose that we do something similar in blaze. We need to preserve the order of the cases, in the case of overlapping conditions.

Proposed syntax:

d = Data(..., dshape='var {a: int64, b: float64, c: int8}')
cased = case(
    (d.a == 1, d.b + 1),
    (d.c == 2, d.a),
    otherwise=d.a
)
llllllllll commented 9 years ago

In the otherwise case, should the = be a comma and is otherwise an alias of True?

cpcloud commented 9 years ago

No, I was thinking it should be the only allowed keyword argument

llllllllll commented 9 years ago

Ah, that is pretty nice.