cmu-db / peloton

The Self-Driving Database Management System
http://pelotondb.io
Apache License 2.0
2.03k stars 623 forks source link

Prepared Statement Type Binding #1387

Closed apavlo closed 6 years ago

apavlo commented 6 years ago

The codegen engine expects that all input parameters for prepared statements are properly cast before they arrive. This causes a problem as shown in the example #1382 where we pass in VARCHAR value for a query that expects an INT parameter. The reason why we don't have this problem in the interpreted engine because we run each parameter value through the cast function with a giant switch statement to put the argument into the correct form. We obviously don't want to do this for the LLVM engine because that would defeat the purpose of compiling.

@pmenon asserted (correctly IMO) that this casting should be performed in the binding phase before we get to the LLVM engine.

There are two types of prepared statements:

  1. The first is where all of the parameter types are specified with the PREPARE command:
    PREPARE xxx (int, int) AS SELECT $1 + $2;
  2. The second is where we have to infer the parameter from the PREPARE command:
    PREPARE xxx (int) AS SELECT $1 + $2;

Handling the second one is obviously more complicated, but it should be doable if we have the right information and utility methods.

There are two parts to this issue:

  1. First we need to determine whether JDBC provides us the parameters when the invoking the prepared statement. This will tell us whether we need to automatically infer types. Note that in JDBC, you don't call PREPARE. You should define the queries with question marks for parameter placeholders: https://github.com/oltpbenchmark/oltpbench/blob/master/src/com/oltpbenchmark/benchmarks/tpcc/procedures/NewOrder.java#L37-L42

  2. Next, we need to add support in the part of the code (TrafficCop?) where we bind parameters to the prepared statement invocation using the type information attached to the prepared statement object (do we even have this?). It is here that we will then cast the type appropriately.

We need this for TPC-C. We can't run the benchmark even if we have index scans in the LLVM engine.

tli2 commented 6 years ago

After investigation, it turns out we already do the proper casting in postgres_protocol_handler.cpp, if the type of parameters are known. I am also having problems reproducing the failure when running TPC-C (@pervazea is sending me his config file).

AFAIK, two things can cause failure on this front at this moment:

Overall, it looks like we just have a bug somewhere in the protocol handler to fix, and not a feature to write in order to support TPC-C. Declaring the type explicitly fixes the test case (We have to use PREPARE in SQL instead of pqxx). To fix this completely, somebody in the class can probably come and implement the type inference system for this (or somebody taking a PL class and needs a project, idk) to fix this once and for all.

tli2 commented 6 years ago

Pervaze and I wasn't able to reproduce this error on TPC-C or YCSB. The failure is probably not due to this. The test case can be fixed by itself. Pervaze will incorporate the fix in his other PR. Closing this issue unless something else comes up.