I tested Alenka on TPC-H on my GPU GTX 460SE, 1GB, 288 Cores, SM2.0.

AlexeyAB commented 11 years ago

Hi Anton! I tested Alenka on TPC-H on my GPU GTX 460SE, 1GB, 288 Cores, SM2.0.

Can you show the results on your GPU and on the others GPU, on which you tested?

I have such results:

<![if supportMisalignedColumns]>

<![endif]>

TPC-H	GPU SM2.0 GTX 460SE 1GB 288 Cores
Volume of DB:	10GB		100GB
Launch number:	1st launch	2nd launch	1st launch	2nd launch
q1.sql	5,3	3,8	90	90
q2.sql	1	0,36	9,3	2,3
q3.sql	1,8	1,7	71	67
q4.sql	4,25	1,3	45	13
q5.sql	-	-	-	-
q6.sql	1,77	0,9	39	8
q7.sql	8,7	2	-	-

Regards, Alexey

AlexeyAB commented 11 years ago

An additional, i can't run q5 and get an error.

For 100 GB TPC-H:

F:\GPU_test_100GB_new>AlenkaDB.exe q5.sql
GeForce GTX 460 SE : 1296.000 Mhz   (Ordinal 0)
6 SMs enabled. Compute Capability sm_21
FreeMem:    867MB   TotalMem:   1023MB.
Mem Clock: 1700.000 Mhz x 256 bits   (108.800 GB/s)
ECC Disabled

Process count = 6200000
BINARY LOAD: R region
Reading 5 records
...
JOIN J1 l_orderkey o_orderkey 829939712
segment 96 560193536
join count 18210128
join time 56.555 801366016
JOIN J2 l_suppkey s_suppkey 829939712
Cuda error in file 'bison.y' in line 1461 : out of memory.

And for TPC-H 10GB i get an exception:

F:\GPU_test_10GB_new>AlenkaDB.exe q5.sql GeForce GTX 460 SE : 1296.000 Mhz (Ordinal 0) 6 SMs enabled. Compute Capability sm_21 FreeMem: 858MB TotalMem: 1023MB. Mem Clock: 1700.000 Mhz x 256 bits (108.800 GB/s) ECC Disabled

Process count = 6200000 BINARY LOAD: R region Reading 5 records FILTER RF R 900055040 MAP CHECK segment 0 R filter is finished 1 899006464 filter time 0.004 900055040 BINARY LOAD: N nation Reading 25 records BINARY LOAD: O orders Reading 15000000 records FILTER OFI O 900055040 MAP CHECK segment 2 R filter is finished 2275919 861650944 filter time 0.065 886554624 BINARY LOAD: S supplier Reading 100000 records BINARY LOAD: C customer Reading 1500000 records BINARY LOAD: L lineitem Reading 59986052 records JOIN J11 n_regionkey r_regionkey 886554624 segment 0 886554624 join count 5 join time 0.007 886554624 JOIN J12 c_nationkey n_nationkey 886554624 segment 0 874496000 join count 300270 join time 0.035 876593152 JOIN J13 o_custkey c_custkey 886554624 segment 2 801226752 join count 456771 join time 0.154 822460416 JOIN J1 l_orderkey o_orderkey 828882944 segment 9 723501056 join count 1825856 join time 1.253 793100288 JOIN J2 l_suppkey s_suppkey 820625408 segment 0 803848192

<- got an exception

antonmks commented 11 years ago

Hi Alexey I do not have a 100GB database right now, only 300GB so I can tell you my 300x results. I do not count the disk read time because it depends only on your disk subsystem so these are run from a host memory ("second run") results : So here it is for 300GB (in seconds) : Q1 - 33 Q2 - 4 Q3 -16 Q5 Q6 - 2 Q7 -30

Q5 doesn't quite work at this scale because of a large intermediate data set. It is easy to fix, I just need to add partitioning routine to a join, just like in groupby. My setup is : Pentium G620 GTX 580 3GB of DDR5 16GB of DDR3

Before creating datafiles I presorted the lineitem.tbl on date_ordered column, this way Alenka can skip entire segments in filter ops. I should be able to run the tests next week on GTX Titan, I expect the results to be significantly faster.

Regards, Anton

AlexeyAB commented 11 years ago

Now Alenka uses segment's (storage) indexes in DSM, like in Oracle ExatadaV2 in PAX?

You'll get access to Titan at some university, in a cloud or will buy it?

And why you exclude q4.sql?

Regards, Alexey

antonmks commented 11 years ago

Alenka doesn't use any indexes. Alenka stores min and max values of every column for every segment so when running operations it is often possible to figure out that none of the values in a segment match, even without reading the segment itself. GTX Titan - I got it as a present from Israeli guys who use Alenka :-) Q4 - I exclude a lot of queries, supporting all TPC-H would require implementing many additional features ...

AlexeyAB commented 11 years ago

This method in Oracle is called the Storage Index, which uses in Smart Scans :) http://www.oracle.com/technetwork/issue-archive/2011/11-may/o31exadata-354069.html

Storage Index

Figure 1 shows how the data inside a storage index is maintained.
...
Each storage index entry covers a physical region of the table, contains minimum and maximum values 
of the columns in that region, and also indicates whether any of the rows in that region contain nulls.

And If interested in HCC Mechanics, three paragraphs about it (half a page), the structure of the hybrid columnar compression (what sounds a whole lot like PAX) in Oracle ExadataV2. They use a sorting of data in the first place, for the effective compression ratio. http://books.google.ru/books?id=1Aqw6mrxz5AC&lpg=PA104&ots=T9pNopWGoA&hl=ru&pg=PA72&redir_esc=y#v=onepage&q&f=false

Discussion and what said Kevin Closson about HCC: http://www.dbms2.com/2009/09/03/oracle-11g-exadata-hybrid-columnar-compression/

antonmks commented 11 years ago

Yeah, hybrid compression becomes popular. If you are into this, take a look at

http://www.dbms2.com/2013/05/27/ibm-blu

Pretty interesting system and it executes queries on compressed data too !

AlexeyAB commented 11 years ago

Thanks!

About presorted the lineitem.tbl on date_ordered column. Can you write that SQL?

This has no effect:

A := LOAD 'lineitem.tbl' USING ('|') AS (orderkey{1}:int,  l_suppkey{3}:int, qty{5}:int, 
 price{6}:decimal, discount{7}:decimal, tax{8}:decimal, rf{9}:varchar(1),
 lf{10}:varchar(1), shipdate{11}:int, commitdate{12}:int, receiptdate{13}:int);

OA := ORDER A BY shipdate ASC;

STORE OA INTO 'lineitem' BINARY;

Outpup: Process count = 6200000 LOAD: A lineitem.tbl 11 | order: OA A STORE: OA lineitem LOADING lineitem.tbl | SQL scan parse worked cycle time 0.81

And no files created.

antonmks commented 11 years ago

Well, you need to presort the entire file : sort -t "|" -k11 lineitem.tbl > lineitems.tbl

Regards, Anton

AlexeyAB commented 11 years ago

Thanks! This is bash, but I on Windows, here sort can't do this :) Will have to run a virtual machine.

antonmks commented 11 years ago

Just install Cygwin, it is really convenient.

antonmks / Alenka

I tested Alenka on TPC-H on my GPU GTX 460SE, 1GB, 288 Cores, SM2.0. #33