cmu-db / peloton

The Self-Driving Database Management System
http://pelotondb.io
Apache License 2.0
2.03k stars 623 forks source link

Mechanism to detect and determine cached statement replan in protocol handler #789

Open ChTimTsubasa opened 7 years ago

ChTimTsubasa commented 7 years ago

1. Background and Problems:

1.1 Postgres Extended Query Protocol

Compared to Postgres Simple Query Protocol, this extended one has a major change to support statement[1] caching, which is trying to cache every query's plan and reuse the plan once the same query appears again.

A typical example would be a client sending two same queries to the server:

For the first query, the server would receive a series of packets including PARSE->BIND[2]->DESCRIBE->EXECUTE. A new statement with a plan generated by optimizer would be stored at PARSE stage.

For the second query, instead of receiving 4 packets again, the server would only get BIND->DESCRIBE->EXECUTE packets. The server later would try to find the generated plan in the cache at BIND stage therefore to skip optimizing.

To support caching, current peloton uses a map with statement name string as the key and statement object as the value.

1.2 Problem of re-planning for cached statement

Using the example in 1.1 for a better illustration: After the first query and before the second query, it is possible that the table's schema has changed by either:

The above situations would make the cached statement invalid and could potentially crash the system. The current implementation does not take this into consideration.

2. Proposed Solution

Basically, this solution relies on the recently implemented transactional catalog to store a "PlanVersion" field on each table tuple. This field is retrieved at PARSE stage and retrieved again and compare at BIND stage (or possibly other stages?) The protocol_handler saves a copy of the "PlanVersion" number in the cache, and compare this number when it comes to BIND to decide whether it should re-plan statement.

2.1 On catalog side

For each pg_table tuple, there should be a field storing the "PlanVersion" number.

An function that could get() the "PlanVersion" given the table_oid(s).

Some places either in catalog or other parts in the system would update that "PlanVersion" of a table if the table schema is changed(ABORT, CREAT or ALTER).

@mengranwo @AngLi-Leon @camellyx

2.2 On protocol_handler side

At PARSE stage, protocol_handler would access the PlanVersion field and stores it in its cache with the statement.

At BIND stage, it should access the PlanVersion field again and compare the version to its previously cached PlanVersion, if they are different, it would invoke the replanStatement() function.

Protocol handler should be able to access the most current TXN which is stored in traffic cop, since catalog's get() method is transactional and the TXN object need to be passed in as an argument.

[1] statement is a data structure that contains the query string and the generated query plan tree. [2] Note that the BIND here is different from what peloton binder is doing. The BIND refers to a stage where a statement is searched from the cache and the parameters of a query are provided. ("SELECT * FROM foo WHERE $1 > 0", the actual value of $1 is provided at BIND stage.)

apavlo commented 7 years ago

I talked with @ChTimTsubasa today about this. I like his plan.

@hzxa21 @chenboy @luochenUmich Can you help him with this when you guys get back next week?

apavlo commented 7 years ago

@malin1993ml I just realized that you will need this feature to run your index tuning experiments on Peloton. You should keep track of it and manage everyone.

linmagit commented 7 years ago

Will do. On Fri, Aug 18, 2017 at 13:19 Andy Pavlo notifications@github.com wrote:

Assigned #789 https://github.com/cmu-db/peloton/issues/789 to @malin1993ml https://github.com/malin1993ml.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/cmu-db/peloton/issues/789#event-1212259706, or mute the thread https://github.com/notifications/unsubscribe-auth/AFojzYSBFp4-m_Eqy_jzhlnu6LazfKhrks5sZceigaJpZM4O5lGx .