forcedotcom / phoenix

BSD 3-Clause "New" or "Revised" License
558 stars 227 forks source link

Metadata is not getting updated for dynamically created columns #678

Open rtvt123 opened 10 years ago

rtvt123 commented 10 years ago

After creating columns dynamically with upsert, users need to keep track of the newly created metadata (column names/types) themselves, as the metadata for dynamically added columns is not retrievable via standard JDBC DatabaseMetaData calls.

for example: create table with one column: CREATE TABLE IF NOT EXISTS my_test_table (rowkey VARCHAR NOT NULL PRIMARY KEY)

add two more columns dynamically: UPSERT INTO my_test_table (rowkey, col1 VARCHAR, col2 VARCHAR) VALUES ('rk', 'value1', 'value2');

Now the only way to see the new column values is by defining column names/types every time at runtime: select * from my_test_table (rowkey VARCHAR, col1 VARCHAR, col2 VARCHAR);

i.e. simple "select * from my_test_table;" would only show the values of the columns defined using stanard ddl (such as "rowkey" column defined in "create table" statment above, or "alter table"). The col1,col2 metadata is not accessible via DatabaseMetaData calls (such as getColumns).

While dynamically defined columns seem to go beyond standard SQL functionality , this is one of the key features of hbase/phoenix that makes it useful for us.

current workarounds are ugly - e.g either spawn a separate thread updating metadata via "ALTER TABLE" asynchronously (decoupled from the thread doing upserts) in order not to slow down the upserts; or use a local metadata cache which introduces consistency issues when syncing multiple phoenix clients.

jtaylor-sfdc commented 10 years ago

Your best bet is to use multi-tenant tables. There's no need to spawn a new thread to do an ALTER TABLE (on a multi-tenant table or a regular table). In Phoenix, an ALTER TABLE is not a heavy weight operation as it is with relational databases. It's more or less the same as a row update. Each time you encounter a column, do an ALTER TABLE ADD IF NOT EXISTS for that tenant id. This will essentially be doing the "bookkeeping" that you're asking for.

FWIW, there's an open issue (#243) for being able to access dynamic columns when you do a SELECT *.