adrpar / paqu

A parallel query engine for MySQL + Spider Engine built on a fork of shard-query
GNU General Public License v2.0
10 stars 0 forks source link

Query with joins takes too long #13

Open kristinriebe opened 11 years ago

kristinriebe commented 11 years ago

I have following query: select p.fofTreeId, p.treeSnapnum, p.x,p.y,p.z from MDR1.FOFMtree as m, MDR1.FOFMtree as p where m.fofTreeId < 100000000+5 and p.fofTreeId between m.fofTreeId and m.mainLeafId

It could be much faster, if first m.fofTreeId < 100000000+5 is used to get a smaller subset of the table before joining. (I tried to enforce this with a subquery, but got a parse error). This is what PaQu is trying, which is very inefficient:

CALL paquExec('SELECT p.fofTreeId AS p.fofTreeId,p.treeSnapnum AS p.treeSnapnum,p.x AS p.x,p.y AS p.y,p.z AS p.z FROM MDR1.FOFMtree AS m JOIN MDR1.FOFMtree AS p ORDER BY NULL ', 'aggregation_tmp_89562445'); CALL paquExec('SELECT p.p.fofTreeId AS p.fofTreeId,p.p.treeSnapnum AS p.treeSnapnum,p.p.x AS p.x,p.p.y AS p.y,p.p.z AS p.z FROM MDR1.FOFMtree AS m JOIN MDR1.FOFMtree AS p JOIN ( SELECT p.fofTreeId,p.treeSnapnum,p.x,p.y,p.z FROM aggregation_tmp_89562445 ) AS p WHERE ( ( m.fofTreeId < 100000000 + 5 ) and ( p.p.fofTreeId between m.fofTreeId and m.mainLeafId ) ) ', 'aggregation_tmp_75999751'); CALL paquDropTmp('aggregation_tmp_89562445'); USE spider_tmp_shard; SET @i=0; CREATE TABLE multidark_user_kristin./*@GEN_RES_TABLE_HERE*/ ENGINE=MyISAM SELECT @i:=@i+1 AS row_id, distinct_res_table.* FROM ( SELECT DISTINCT p.fofTreeId,p.treeSnapnum,p.x,p.y,p.z FROM aggregation_tmp_75999751 ) as distinct_res_table; CALL paquDropTmp('aggregation_tmp_75999751');

adrpar commented 11 years ago

partially fixed in ca2a2623729be642a128cf4dfd1dd9849fc60926

however a mess remains with ANDs and ORs in WHERE conditions. The whole implicit JOIN part needs rethinking on how to handle compilcated cases of ANDs and ORs

i'm not feeling comfortable here...