Tweak syntax for multi-tenant tables

jtaylor-sfdc commented 10 years ago

The current syntax relies on a BASE_TABLE property, which is somewhat outside of the SQL syntax. Here's an example of the current syntax:

CREATE TABLE my_base_schema.base_table (
    tenant_id CHAR(15), 
    type_id CHAR(3), 
    entity_id CHAR(15));
CREATE TABLE my_schema.my_table (
    additional_col VARCHAR)
    BASE_TABLE='my_base_schema.base_table',
    TENANT_TYPE_ID='abc';

Here's an example of the proposed new syntax:

CREATE TABLE my_base_schema.base_table (
    tenant_id CHAR(15),
    type_id CHAR(3),
    entity_id CHAR(15)) MULTI_TENANT=true;
CREATE TABLE my_schema.my_table (column1 VARCHAR)
LIKE my_base_schema.base_table AS 'abc';

This would also be allowed

ALTER TABLE my_base_schema.base_table SET MULTI_TENANT=true;

as would this, but only if the table has no tenant-specific tables:

ALTER TABLE my_base_schema.base_table SET MULTI_TENANT=false;

Changes include:

Explicitly declaring a table as multi-tenant. This will allow us to do any validation required for the base table and create a second HBase table where the index rows would be stored (see #491).
Don't allow a tenant connection to create a table from a base table that is not multi-tenant.
The tenant type ID is optionally specified in the AS clause. If not supplied, then the automatic filtering we do on tenant-specific tables would be restricted to just the first column (i.e. tenant_id) instead of two columns (i.e. tenant_id and type_id). The column definitions should also be optional and if left off would just create a tenant-specific table without any additional columns being defined.
Support using LIKE for non-tenant-specific use cases, as I believe the code would be the same. I think in this case, we wouldn't want to restrict what you can/can't do on the base table, though. Maybe the easiest way to handle this is by just not tracking the base table for these cases - we'd essentially just be making a copy of the metadata to create a table with a new name (that's what Postgres and MySQL support).

elilevine commented 10 years ago

I like the explicit MULTI_TENANT='true' on the base table. We should consider surfacing that in DatabaseMetaData in some form.

jtaylor-sfdc commented 10 years ago

Agreed. Just need to return that column (at the end of the select) in the ResultSet for getTables()

On Thu, Dec 19, 2013 at 12:30 PM, Eli Levine notifications@github.comwrote:

I like the explicit MULTI_TENANT='true' on the base table. We should consider surfacing that in DatabaseMetaData in some form.

— Reply to this email directly or view it on GitHubhttps://github.com/forcedotcom/phoenix/issues/592#issuecomment-30963643 .

AakashPradeep commented 10 years ago

Just curious to know that why we have not considered to call this create table as "CREATE TENANT TABLE" or DERIVED TABLE for example:

CREATE [TENANT | DERIVED ] TABLE my_schema.my_table (column1 VARCHAR) LIKE my_base_schema.base_table AS 'abc';

Since CREATE TABLE and LIKE give me the impression that we are creating another table in HBase having the same schema as BASE table.

jtaylor-sfdc commented 10 years ago

Good points, @AakashPradeep. Perhaps using LIKE would be confusing. I'm considering the following syntax:

-- Declare base table as both multi-tenant and multi-type. It will be possible to
-- just declare one or the other as well. For example, for the non multi-tenant
-- scenario, you may want to use a base table to represent multiple types.
CREATE bar (tenant_id CHAR(15), type_id CHAR(3), k INTEGER)
MULTI_TENANT=true, MULTI-TYPE=true

-- Instead of using CREATE ... LIKE, use DERIVE ... FROM which expresses
-- the relationship a bit better. We could still support CREATE ... LIKE pretty
-- easily, but in that case, it'd just make a copy of the source table.
DERIVE foo (aaa VARCHAR)
FROM bar AS 'abc'

jtaylor-sfdc commented 10 years ago

I've made the above changes, but the more I think about it, the more I think these concepts fit into the SQL concept of a VIEW. The multi-tenant aspects put a bit of a wrinkle in things, but we can infer a VIEW is multi-tenant based on the connection being a tenant-specific connection.

Here's an example:

CREATE  TABLE product_metrics (
    tenant_id CHAR(15),
    user_id CHAR(15), 
    created_date DATE, 
    metric_id INTEGER,
    category_id VARCHAR,
    CONSTRAINT pk PRIMARY KEY (tenant_id, user_id, created_date, metric_id));

Note that we could potentially still require the MULT_TENANT=true property above to be more explicit about allowing the creation of tenant-specific tables from a given table.

To create a multi-tenant table, you could connect through a tenant-specific connection and do the following:

CREATE VIEW my_product_metrics AS
SELECT * FROM product_metrics;

Since you've connected through a tenant-specific connection, the WHERE tenant_id=? clause would automatically be tacked on. You could optionally add additional columns during the CREATE like this:

CREATE VIEW my_product_metrics(product_name VARCHAR) AS
SELECT * FROM product_metrics;

To create a multi-tenant, multi-type table, you could add a WHERE clause like this:

CREATE VIEW mobile_product_metrics(phone_type TINYINT)  AS
SELECT * FROM product_metrics
WHERE category_id = 'MOBILE';

You could also create a VIEW from a non tenant-specific connection.

You could use any WHERE clause, but for your VIEW to be updateable, you could only use equality expressions. These expressions would then be applied automatically for UPSERT statements.

Initially, we might only support SELECT *, but we could support a list of columns pretty easily and eventually when we have derived tables, even expressions.

elilevine commented 10 years ago

This is really cool. If I read this right, you are proposing:

Use CREATE VIEW statement to create multi-tenant and/or multi-type tables.
Keep multi-tenant logic mostly the same, with the exception of possibly adding MULTI_TENANT=true property when creating base tables that support multi-tenant tables on top of them.
Make multi-type tables more generic by allowing users to specify a WHERE clause in CREATE VIEW. This WHERE clause would automatically be applied to SELECTs and UPSERTs for the view.

I like CREATE VIEW more than DERIVE. The term "view" represents well the fact that these multi-tenant and/or multi-type tables are a sliver of data in a physical HBase table. I also like the explicit MULTI_TENANT=true property on base tables. We might want to consider @AakashPradeep's suggestion of adding the keyword TENANT in there to make things more explicit and allow Phoenix to fail if a tenant-specific table was created over a non-tenant-specific connection and visa versa:

CREATE TENANT VIEW my_product_metrics(product_name VARCHAR) AS
SELECT * FROM product_metrics;

The flexibility of the WHERE clause in defining how multi-type is great. One thing of concern is that currently typeId is required to be in the row key behind tenantId for tenant-specific tables, making for fast scans. Example in previous comment uses category_id, a key-value column.

jtaylor-sfdc commented 10 years ago

Having typeId after tenantId is not a requirement. It would perform better, true, but it shouldn't be a requirement. It's more of a design decision.

I'll start down this path and let folks know if I hit any roadblocks.

jtaylor-sfdc commented 10 years ago

I will write this up on the wiki too, but the above is now implemented:

multi-tenant tables are views on regular tables.
regular/non multi-tenant tables can have views too.
views can have views. You can specify a WHERE clause on your CREATE VIEW statement. It uses standard SQL syntax, with the restriction that you can't yet project a subset of your table columns into your view - you have to project them all. However, I also just added a check that disallows a tenant-specific connection from referring to the tenant id column.

Phoenix figures out if the view is "updatable" based on the WHERE clause in the CREATE VIEW statement. The basic rule is that it's updatable if you only have equality expressions with a column reference and a literal separated by ANDs, like this:

CREATE TABLE t (
    id INTEGER NOT NULL PRIMARY KEY,
    a VARCHAR,
    b INTEGER,
    c DATE);

CREATE VIEW v AS
SELECT * FROM t
WHERE a='foo' AND b=1

In these cases, Phoenix can infer the value that the columns have, so it can allow you to update through the view. For example, you could do this:

UPSERT INTO v(id,c) VALUES(100,CURRENT_DATE());

You get an error, though, if you try to update through a view and change anything that would make the WHERE clause not be satisfied. For example, this would throw an exception, since you're trying to set a to something other than 'foo':

UPSERT INTO v(id,a) VALUES(100,'zzz');

If your view isn't updatable, you get an error if you try to do an UPSERT or DELETE on it. For example, the following would not be updatable:

CREATE VIEW v AS
SELECT * FROM t
WHERE a = 'foo' AND c < CURRENT_DATE() - 7

It's still valid, though, and you can query through it and even derive other views from it, but you just can't use it in an UPSERT or DELETE statement. Most RDBMS have the concept of an updatable view. The restrictions on what is and what isn't updatable is not standardized, though. I thought we'd start simple on this - we could potentially make more views updatable down the road.

forcedotcom / phoenix

Tweak syntax for multi-tenant tables #592