davidagold / StructuredQueries.jl

Query representations for Julia
Other
53 stars 5 forks source link

Collect graphs produced by at-query #7

Closed davidagold closed 8 years ago

davidagold commented 8 years ago

This PR changes the name of run to collect and exposes an extension of Base.collect to the user to use on graphs generated by @query. (The internal interface is also consolidated and moved to src/collect.jl.) Of course, the only supported data sources as of now are DataFrames, and this support is also implemented in this PR.

julia> q = @query iris |>
           filter(PetalLength > 1.5, Species == "setosa") |>
           select(SepalLength)
# output suppressed because I'm still dragging my feet on implementing show

julia> collect(q)
13×1 DataFrames.DataFrame
│ Row │ SepalLength │
├─────┼─────────────┤
│ 1   │ 5.4         │
│ 2   │ 4.8         │
│ 3   │ 5.7         │
│ 4   │ 5.4         │
│ 5   │ 5.1         │
│ 6   │ 4.8         │
│ 7   │ 5.0         │
│ 8   │ 5.0         │
│ 9   │ 4.7         │
│ 10  │ 4.8         │
│ 11  │ 5.0         │
│ 12  │ 5.1         │
│ 13  │ 5.1         │

The main work in providing this support has to do with generating the expressions for defining the filtering kernels and setting the respective FitlerHelper objects into their respective FilterNodes in the graph.

Right now, we don't do anything to optimize in the case of multiple filter calls, e.g.

q = @query iris |>
    filter(PetalLength > 1.5, Species == "setosa") |>
    filter(SepalLength > 5.0)

But in the future we will hopefully find ways to be smart about how we generate kernel definitions for such cases.