CivicSpleen / ambry

A comprehensive data package manager
BSD 2-Clause "Simplified" License
4 stars 5 forks source link

Implement Warehouse #139

Closed ericbusboom closed 8 years ago

ericbusboom commented 8 years ago

Implement the Ambry Warehouse, using the Sqlite Virtual Tables or the Postgres FDW.

The warehouse provides SQL access to datasets in the library, allowing users to issue SQL queries, either as SQL or via Sqlalchemy, to return datasets.

import ambry
l = ambry.get_library()
for row in l.warehouse.query('SELECT * FROM ... '):
    print row

The warehouse query() call parses the SQL to determine which virtual tables to load, then ensures those tables are loaded before issuing the query.

Users can also materialize a table, to ensure that the data in the table is native, rather than being loaded from a virtual table.

Users can refer to a partition name or number, either versioned or unversioned. If the name is unversioned, the warehouse will load the one with the highest version number. Callers can also reference a versioned or unversioned table name or number. If the table is implemented by more than one partition, the warehouse will load all of the partitions and create a UNION query that will return the results from all of the partitions.

Specifically, if a table is references:

The warehouse database can be specified with a DSN in the library.warehouse configuration. If that config does not exist, the library database will be used. For postgres, the warehouse tables will be in a schema named 'warehouse'

Interface

This description of the interface is just a suggestion. Do whatever seems to work best.

In this example, all of the w.install() statements install the same partition

import ambry l = ambry.get_library() w = l.warehouse w.warehouse.install('ingest.example.com-basic-simple') # Unversioned name w.warehouse.install('ingest.example.com-basic-simple-0.0.1') # Versioned name w.warehouse.install('pLMQfCPN4s004001') # Vid w.warehouse.install('pLMQfCPN4s004') # Id w.warehouse.install('tLMQfCPN4s02001') # Versioned table w.warehouse.install('tLMQfCPN4s02') # Unversioned table

The last two will install the partition, then create the UNION view.

Auto Installation

When SQL code is provided in warehouse.query(), the code is parsed and the referenced partitions are installed, as described in #140

nmb10 commented 8 years ago

Everything done, except 'With no argument, return a Sqlalchemy query or select object '. Didn't understand what to return exactly. Can you give an example of the return value?

ericbusboom commented 8 years ago

For Table is the Sqlalchemy metadata for a partition data table in the warehouse, and warehouse.session is a Sqlalchemy session, return warehouse.session.query(Table) or warehouse.session.select(Table).

The goal is that the caller can then add Sqlalchemy .filter(), .all(), etc, to run a custom query.

eric.