ddf-project / ddf-flink

DDF with Flink Implementation
Apache License 2.0
3 stars 10 forks source link

SQLHandler does not support drop table command #44

Open Shiti opened 9 years ago

Shiti commented 9 years ago

Is it possible to delete the specified DDF when a drop table command is executed on it?

ctn commented 9 years ago

ddf-core itself does not/should not specify what happens upon receiving a "DROP TABLE" command, but rather should pass it through to the underlying implementation. For example, I would expect a DDF-fronted SQL server to proceed to drop the underlying table as a result of receiving this command. @binhmop thoughts?

trulite commented 9 years ago

Flink SQL support is a Parser/translator to Flink table API. Since we own the parser. This is possible to do. I can delete the DDF since the Parser is a part of the SQL Support itself. However, I am not sure if the user expects the same semantics.

binhmop commented 9 years ago

This raises an issue we've discussed on how to handle 'drop table' cmd in particular. In general, the underlying query engine will execute any SQL passed by SQLHandler. However, when 'drop table' is executed on a ddf, currently only the underlying table is deleted, and DDFManager still keeps the pointer to the associated DDF, but the DDF is actually invalid. So do we expect user to explicitly call DDFManager.removeDDF after running 'drop table'? Or we detect that 'drop table' cmd and implicitly remove the DDF, which can be an approach for the Flink impl in this case.

ctn commented 9 years ago

The DDF here is an io.ddf.flink DDF, just as a Spark DDF is an io.ddf.spark DDF. These DDF are distinct entities from whatever source they are loaded from. There is no reason to expect that a DROP TABLE, if passed along to the on-disk source, should also result in a deletion of the in-memory DDF.

I realize that if there is close coupling between the in-memory DDF and the original data source, certain semantics may fail after such a DROP TABLE. But, consider that such a table might be mutable by other actors, and there's no strong reason to expect this coupling to be assuredly safe.

In a near-future implementation, we would want to put the DDF API/facade on top of these "underlying" data sources, too, so that it's clear we're reading from one DDF into another DDF, and this question will be less likely to come up. The basic idea is that DDF should be the common facade for all data/compute engines.