cmu-db / peloton

The Self-Driving Database Management System
http://pelotondb.io
Apache License 2.0
2.03k stars 623 forks source link

Decouple the DataTable pointer from the plan objects #1370

Open linmagit opened 6 years ago

linmagit commented 6 years ago

We now have the storage::DataTable pointer inside many plan objects, such as AbstractScanPlan, AnalyzePlan, DeletePlan, InsertPlan, and UpdatePlan. This prevents us from shipping the plan objects around or using the "what-if" API on the brain side.

I'll take SeqScanPlan for example. Instead of getting the table pointer and storing it in the plan object here: https://github.com/cmu-db/peloton/blob/master/src/optimizer/plan_generator.cpp#L82 We should directly store the database oid and table oid. And then at the query execution/compilation time, we should get the TableCatalogObject using the `oid's with this API: https://github.com/cmu-db/peloton/blob/master/src/include/catalog/catalog.h#L224 Then we can get the table name, database name and the schema name. Finally we can use this API to get the table object pointer: https://github.com/cmu-db/peloton/blob/master/src/include/catalog/catalog.h#L203 Everything should go through the catalog.

apavlo commented 6 years ago

I will assign this to @Zeninma once he joins the group on Github.

Zeninma commented 6 years ago

I am having a problem while doing the decoupling. It would be great if I can get some help on this.

My approach for decoupling

In ordert to do the decoupling, I substitute storage::DataTable target_table in some plan objects with the corresponding database and table oid_t. The storage::DataTale *GetTable() is replaced with storage::DataTable *GetTable(concurrency::TransactionContext *txnt). The new GetTable method uses the database and tabe oid_t to first get DatabaseCatalogObject and GetTableObject via (https://github.com/cmu-db/peloton/blob/master/src/include/catalog/catalog.h#L218), (https://github.com/cmu-db/peloton/blob/master/src/include/catalog/catalog.h#L228) in Catalog. Then uses GetTableWithName (https://github.com/cmu-db/peloton/blob/master/src/include/catalog/catalog.h#L207) to get the storage::DataTable object.

My current Problem

The GetDatabaseObject, GetTableObject and GetTableWithName all take in the concurrency::TransactionContext as their arguments. However, under some circumstances, the transaction argument cannot be derived. Here is one such circumstance for instance:

In the PostgresProtocolHandler::ExecBindMessage method it calls a plan object's SetParameterValues function (https://github.com/cmu-db/peloton/blob/master/src/network/postgres_protocol_handler.cpp#L526). If the plan object is an IndexScanPlan, it will result in calling GetTable method(https://github.com/cmu-db/peloton/blob/master/src/planner/index_scan_plan.cpp#L88) and needs a concurrency::TransactionContext argument. However, the transaction object cannot be derived. I am not sure how to bypass this issue.

apavlo commented 6 years ago

The storage::DataTale GetTable() is replaced with storage::DataTable GetTable(concurrency::TransactionContext *txnt)

I don't think that the plan objects should support GetTable anymore. That should be done from outside of the plan.

If the plan object is an IndexScanPlan, it will result in calling GetTable method and needs a concurrency::TransactionContext argument.

@tli2 It looks IndexScanPlan just needs the value type for a particular column in the index:

https://github.com/cmu-db/peloton/blob/master/src/planner/index_scan_plan.cpp#L88

Can we modify the binder to record this when it constructs the IndexScanPlan? The plan objects should never need to reference the catalog internally. All of that should be done on the outside.