Closed bmanola closed 11 years ago
Hello ! Sorry for the bugs, the last few days I was working on fixing the problem . I will post the update on a github today, let me know if it works for you. Oh, I probably should add to the manual that you always need to have a COUNT(field) in a group by statement. So try it like this : D := SELECT sccode AS sccode1, product AS product1, SUM(sale) AS sale_sum, COUNT(product) AS cnt FROM A GROUP BY area, product;
If it doesn't work let me know and I will try to reproduce the problem.
Regards,
Anton
On Sun, Jan 27, 2013 at 8:08 PM, bmanola notifications@github.com wrote:
Dear Anton,
First i want to say this is remarkable project. Analyzing large dataset using GPUs is great idea.
I have a problem with correct select statement. I load file (around 30M rows) into BINARY structure. When load this binary structure and make basic SQL D := SELECT sccode AS sccode1, product AS product1, SUM(sale) AS sale_sum FROM A GROUP BY area, product; I get less rows then same sql but on database.76K instead of 500k.Some products are missing which i for sure have in bigtable in DB.
More strange is when i user FILTER product <= 100000 (this number is not important max product code is 40000) i get around 160K
Can you tell whats wrong with my sql statement?
For loading data i use A := LOAD 'bigtable.csv' USING (',') AS (uniqueid{1}:int, ccode{2}:varchar(10), acode{3}:varchar(10), sccode{4}:varchar(10), supplier{5}:int, product{6}:int, sale{15}:decimal); STORE A INTO 'bigtable' BINARY;
Best Regards,
— Reply to this email directly or view it on GitHubhttps://github.com/antonmks/Alenka/issues/3.
Hi, Anton,
I am sending you output if that might help
Process count = 6200000 BINARY LOAD: A bigtable SELECT D A cycle 0 select mem 851771392 final select 81233 select time 3.22 STORE: D mytest.txt | SQL scan parse worked cycle time 3.534
I noticed that process makes only one cycle instead of 4(or 5).
Sql for selecting is same as you suggested.
Regards,
I think I fixed it now. Sorry for the bug, just not enough testing on my part ! I have generated 30 million records and the query takes exactly 2 seconds on my GTX 580.
Dear Anton,
First i want to say this is remarkable project. Analyzing large dataset using GPUs is great idea.
I have a problem with correct select statement. I load file (around 30M rows) into BINARY structure. When load this binary structure and make basic SQL D := SELECT sccode AS sccode1, product AS product1, SUM(sale) AS sale_sum FROM A GROUP BY sccode, product; I get less rows then same sql but on database.76K instead of 500k.Some products are missing which i for sure have in bigtable in DB.
More strange is when i user FILTER product <= 100000 (this number is not important max product code is 40000) i get around 160K
Can you tell whats wrong with my sql statement?
For loading data i use A := LOAD 'bigtable.csv' USING (',') AS (uniqueid{1}:int, ccode{2}:varchar(10), acode{3}:varchar(10), sccode{4}:varchar(10), supplier{5}:int, product{6}:int, sale{15}:decimal); STORE A INTO 'bigtable' BINARY;
Best Regards,