TPC-H benchmark issues - Githubissues

jfeser commented 4 years ago

Hi,

I've been trying to get the tpch benchmarks to run, and I'm hitting a few issues. I'm able to get chestnut to generate code and to generate the backing DB, but I haven't been able to run the generated code.

The issue that I'm seeing right now is that the generated code exits while loading data from the DB for tpch query 1. It looks to me like there is an off-by-one error in the VarChar template. In particular, when LENGTH=1, the calls to memcpy will copy zero bytes. Something in the data loading code notices this and calls exit.

If I patch this issue, I run into two more problems. The first is that the generated code is missing some return statements in non-void functions. Clang helpfully inserts an illegal instruction here, which crashes the program. Fixing this leads to a segmentation fault in some other code.

I'm pretty sure that I'm doing something wrong in the codegen process. This is happening with the tip of the partition branch. Any pointers to the right branch or the right process for generating code would be great!

congy commented 4 years ago

Hi!

Sorry for the late reply, I just saw this. I pushed a change to fix the issue of missing return statement by replacing the non-void function with void. The return value was meant to catch errors during data insertion but it's not really used for now. I also fix the off-by-1 error in copying VarChar.

For individual query test, you may use test_codegen_one_query function to only generate data structures and query code for a particular query. I add an example in benchmark/tpch/tpch_python.py.

Thanks for pointing out the issue! Things should work for TPCH Q1 now. Let me know if there's any other bugs! (usually I would reply much faster by email...)

jfeser commented 4 years ago

Thanks! I'll give it another shot and send email if there's any other issues.

congy / chestnut

TPC-H benchmark issues #2