dase / CLAIMS

Claims is an in-memory parallel database prototype, which runs on clusters of shared-nothing servers and provides efficient and scalable data analysis.
http://dase.ecnu.edu.cn/claims
117 stars 57 forks source link

TPC-H: Query 1 #23

Open wangli1426 opened 10 years ago

wangli1426 commented 10 years ago

In TPC-H benchmark, Query 1 is as following.

select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price, 
sum(l_extendedprice*(1-l_discount)) as sum_disc_price, 
sum(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge, 
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from 
lineitem
where
l_shipdate <= date '1998-12-01' - interval '[DELTA]' day (3)
group by l_returnflag, l_linestatus 
order by l_returnflag, l_linestatus;

The challenges to support this sql is summarized as following.

@egraldlo: please make sure whether all the expressions in aggregation function have been supported now.

egraldlo commented 10 years ago

All the expressions in this query have been supported, but the implementation of expressions have large redundance. maybe we can simplify that.

wangli1426 commented 10 years ago

@egraldlo : Currently, the performance of evaluation of expression is not our first concern. I'm try to leverage LLVM to accelerate the expression calculation. However, it is very likely to take a long time (at least a month) before I know how to use LLVM in Claims.

egraldlo commented 10 years ago

OK, I will try to make it support the current requirements of query 1 in this week, btw, is the type "t_date" passing testing? just want to make sure whether it works, something wrong I met with.

egraldlo commented 10 years ago

the type "t_date" works well.

scdong commented 10 years ago

In tpc-h, DELTA is randomly selected within [60. 120]. The intent is to choose DELTA so that between 95% and 97% of the rows in the table are scanned. The condition "l_shipdate <= date '1998-12-01' - interval '[DELTA]' day (3)" means that l_shipdate <= (1998-12-01 - DELTA). I think the minus function could be done in SQL parser @fzh

egraldlo commented 10 years ago

the expressions in this query in project all can be supported. I can merge it into your @fzhedu version in this night. And next we can implement "AS" in it. BTW, I put the makefiles of all modules of old version in here https://github.com/egraldlo/Claims/tree/master/doc. @wangli1426, but this is a simple version. and I make a little change, maybe we can have a discussion about the details. these makefiles work well but some ugly, we can modify it in the graceful way then.