davidagold / StructuredQueries.jl

Query representations for Julia
Other
53 stars 5 forks source link

Refactor #6

Closed davidagold closed 8 years ago

davidagold commented 8 years ago

This refactor is motivated primarily by the realization that, in order for graphs produced by @query to be runable, the base DataNode cannot just store a symbol -- it must wrap the data source object itself (be it a DataFrame, database connection, etc.). Otherwise, runing the graph will have no ability to access the source (apart from eval, but this isn't a viable option for reasons mentioned elsewhere). Thus, at least some of the graph generation must occur at runtime. However, as noted in #1, in order for the graph to be relevant to, say, producing a filtering kernel, it must occur largely at macroexpand-time. This is an issue for both piped to and non-piped to manipulations wrapped in an @query. To summarize: the difficulty is that all of the information we wish to contain in the non-base QueryNodes (i.e. everything except DataNode) is present at macroexpand-time, at which point we would like to generate an appropriate graph, but the information to be contained in the base node (the DataNode) is only available at runtime.

In order to achieve the necessary balance that is becoming apparent in these experiments, this PR makes DataNodes mutable -- they can be incompletely initialized and a general graph can be given a data source post-generation via set_src!. However, this means that the DataNode type cannot convey information about the type of its wrapped data source, since that information isn't available at macroexpand time, when the DataNode is constructed. It's not clear if this will present an issue, since the content of a DataNode will pass through function barriers in the course of runing the query.

This PR also introduces the FilterHelper type, which contains the information necessary to perform filtering on a DataFrame -- i.e., the filtering kernel and the relevant fields (as symbols). FilterHelpers are produced at runtime (have to be, since they wrap the filtering kernel) are contained within FilterNodes. The latter in turn are now mutable -- they can be incompletely initialized without FilterHelpers. This decision is with an eye towards how filtering kernels will be produced for complex graphs returned by @query.

This PR introduces the empty CurryNode type whose sole purpose is to clarify the dispatch pattern that produces lambdas for accepting piped data sources in the one-off macros.

This PR includes other minor changes for the sake of efficiency. For instance, the _command methods (e.g. _filter) have been removed and instead the relevant QueryNode constructors are called directly.

I apologize to the git deities for doing this all in a single commit.