antonmks / Alenka

GPU database engine
Other
1.17k stars 120 forks source link

Out of Memory with large intermediate result set #17

Closed georgezhlw closed 11 years ago

georgezhlw commented 11 years ago

Suppose i have the following query: A := LOAD 'member_skill' BINARY AS (a_memberid{1}:int, skill{2}:varchar2(40)); B := LOAD 'mlt_k133B' BINARY AS (b_memberid{1}:int); J := SELECT skill AS sk FROM A JOIN B on a_memberid = b_memberid; G := SELECT sk AS skill_name, COUNT(sk) AS cnt FROM J GROUP BY sk; R := ORDER G BY cnt DESC; STORE R INTO 'member_skill_top10.txt' USING ('|') LIMIT 10;

A is 500 million and B is 12 million, J will be 12 million with varchar2(40), then when run group by on J, will get out of memory error.

If i store J as binary and load it again like the following, it looks like Alenka will not work? STORE J INTO 'member_skill_temp' BINARY; J2 := LOAD 'member_skill_temp' BINARY AS (skill_name{1}:varchar2(40));

Can the intermediate result be automatically partitioned if too large for device memory? What's the suggested workaround if not?

thanks, George

antonmks commented 11 years ago

Yes, I know about this issue. I plan to add the code that will partition the datasets and process them correctly later this month. Both groupby and join opeartions need to be modified to work with intermediate results that do not fit into gpu memory.

Anton

georgezhlw commented 11 years ago

Thanks Anton. Please update this thread so that I get the update if any progress. regards, George

antonmks commented 11 years ago

Ok, you might wanna try this with a new build.

Anton