antonmks / Alenka

GPU database engine
Other
1.17k stars 120 forks source link

Why I can not use a decimal number in the summation of numbers - sum aggregate function? #27

Closed AlexeyAB closed 11 years ago

AlexeyAB commented 11 years ago

Hi Anton! Why I can not use a decimal number in the summation of numbers - sum aggregate function?

For example take a table "partsupp" from TPC-H 10GB - table that used in q2m.sql. If I calculate sum of the decimal numbers, grouped by integers:

PS := LOAD 'partsupp' BINARY AS 
    ( ps_partkey{1}:int, ps_suppkey{2}:int, ps_supplycost{4}:decimal);

RES := SELECT sum(ps_supplycost) as sum_ps_supplycost
 FROM PS
 GROUP BY ps_suppkey;

STORE RES INTO 'res_q2m.txt' USING ('|') LIMIT 100;

I got a message:

No group by on float/decimal columns

If I use it so:

PS := LOAD 'partsupp' BINARY AS ( ps_partkey{1}:int, ps_suppkey{2}:int, ps_supplycost{4}:decimal);

RES := SELECT ps_suppkey as ps_suppkey, sum(ps_supplycost) as sum_ps_supplycost
 FROM PS
 GROUP BY ps_suppkey;

STORE RES INTO 'res_q2m.txt' USING ('|') LIMIT 100;

then get an exception.

But if I calculate sum of the integer numbers, grouped by decimals, then all goes well:

PS := LOAD 'partsupp' BINARY AS 
    ( ps_partkey{1}:int, ps_suppkey{2}:int, ps_supplycost{4}:decimal);

RES := SELECT sum(ps_suppkey) as sum_ps_suppkey
 FROM PS
 GROUP BY ps_supplycost;

STORE RES INTO 'res_q2m.txt' USING ('|') LIMIT 100;

I have result:

3900654|
1981506|
2939347|
3569270|
2453065|
3689991|
...

It may should be vice versa? (Can calculate sum of the decimal numbers, and cann't opposite)

An additional. As I know, Alenka-integer is a 64-bit signed value. What precision of the Alenka-decimal number and what ranges of values it can have?

Alexey.

antonmks commented 11 years ago

Well, I was hoping that people wouldn't run queries like that :-). But for now if you use an aggregate functions like SUM, MIN, MAX etc it is required that you list a grouping column too :

RES := SELECT ps_suppkey AS key, sum(ps_supplycost) as sum_ps_supplycost FROM PS GROUP BY ps_suppkey;

If it is absolutely critical to be able to run this kind of queries, let me know, I will make the changes as soon as I have the time.

About decimals - they have 2 digit precision. If you want more, use float - it is stored internally as 64bit double.

Best regards, Anton

AlexeyAB commented 11 years ago

Ok :) But, what is surprising, if it grouped by integer numbers, then doesn't require list a grouping column too in select list.

And main, if I list a grouping column, then often I get an exception, probably lacking GPU RAM. It happens very often in my test cases. I have 1GB GPU RAM, and 8 GB (5 GB free) CPU RAM.

May be somewhere specify (in #define or get from command line) the maximum of using of GPU RAM, or/and allocate and run in the GPU RAM in try-catch block, and if it were failure, then use GROUP BY in host? If it will done, it will make sense and I will think about implement hybrid calculating of SORT/JOIN/FILTERING/GROUP BY on GPU-CPU by parts.

Also it may be present memory leak. Regards, Alexey