Closed ericbusboom closed 8 years ago
Everything done, except 'With no argument, return a Sqlalchemy query or select object '. Didn't understand what to return exactly. Can you give an example of the return value?
For Table is the Sqlalchemy metadata for a partition data table in the warehouse, and warehouse.session
is a Sqlalchemy session, return warehouse.session.query(Table)
or warehouse.session.select(Table)
.
The goal is that the caller can then add Sqlalchemy .filter()
, .all()
, etc, to run a custom query.
eric.
Implement the Ambry Warehouse, using the Sqlite Virtual Tables or the Postgres FDW.
The warehouse provides SQL access to datasets in the library, allowing users to issue SQL queries, either as SQL or via Sqlalchemy, to return datasets.
The warehouse
query()
call parses the SQL to determine which virtual tables to load, then ensures those tables are loaded before issuing the query.Users can also materialize a table, to ensure that the data in the table is native, rather than being loaded from a virtual table.
Users can refer to a partition name or number, either versioned or unversioned. If the name is unversioned, the warehouse will load the one with the highest version number. Callers can also reference a versioned or unversioned table name or number. If the table is implemented by more than one partition, the warehouse will load all of the partitions and create a UNION query that will return the results from all of the partitions.
Specifically, if a table is references:
Databases
The warehouse database can be specified with a DSN in the library.warehouse configuration. If that config does not exist, the library database will be used. For postgres, the warehouse tables will be in a schema named 'warehouse'
Interface
This description of the interface is just a suggestion. Do whatever seems to work best.
l.warehouse
. Return reference to the warehousewarehouse.query(<sql>)
. Execute the SQL code and return a cursor to iterate through the results.warehouse.query()
. With no argument, return a Sqlalchemy query or select objectwarehouse.install(ref)
. Create the FDW or Virtual tables from a partition reference. The FDW / Virtual table is named after the partition vid.warehouse.index(ref, columns)
. Install a partition ref, if it isn't already, and create an index on the columns.warehouse.materialize(ref)
. Install a partition if it isn't already, but give it a name that ends in '_v'. Create a table, named after the partition vid.Example
In this example, all of the
w.install()
statements install the same partitionimport ambry l = ambry.get_library() w = l.warehouse w.warehouse.install('ingest.example.com-basic-simple') # Unversioned name w.warehouse.install('ingest.example.com-basic-simple-0.0.1') # Versioned name w.warehouse.install('pLMQfCPN4s004001') # Vid w.warehouse.install('pLMQfCPN4s004') # Id w.warehouse.install('tLMQfCPN4s02001') # Versioned table w.warehouse.install('tLMQfCPN4s02') # Unversioned table
The last two will install the partition, then create the UNION view.
Auto Installation
When SQL code is provided in warehouse.query(), the code is parsed and the referenced partitions are installed, as described in #140