adrpar / paqu

A parallel query engine for MySQL + Spider Engine built on a fork of shard-query
GNU General Public License v2.0
10 stars 0 forks source link

Functions as arguments of functions incorrectly treated as column names #37

Open kristinriebe opened 9 years ago

kristinriebe commented 9 years ago

Query

SELECT log10(COUNT(*)), snapnum
FROM MDR1.FOF 
GROUP BY snapnum

Query plan

CALL paquExec('SELECT log10( COUNT ) AS `_log10_COUNT_*_`,`snapnum` AS `snapnum` FROM MDR1.FOF GROUP BY snapnum ', 'aggregation_tmp_21796363');
USE spider_tmp_shard; SET @i=0; CREATE TABLE cosmosim_user_kristin.`/*@GEN_RES_TABLE_HERE*/` ENGINE=MyISAM SELECT @i:=@i+1 AS `row_id`,  `_log10_COUNT_*_`,`snapnum`
FROM `aggregation_tmp_21796363`  GROUP BY `snapnum`  ;
CALL paquDropTmp('aggregation_tmp_21796363');

log10(count(*)) is not treated correctly in the query plan. PaQu seems to assume that the argument of a function, e.g. log10, is a column name and cannot be a function itself.

Current work around Reformulate the query and apply the outer function only at the end:

SELECT log10(num), snapnum FROM 
(SELECT COUNT(*) as num, snapnum
FROM MDR1.FOF 
GROUP BY snapnum) as s
ORDER BY snapnum

which results in the (correct) query plan:

CALL paquExec('SELECT COUNT(*) AS `num`,`snapnum` AS `snapnum` FROM MDR1.FOF GROUP BY snapnum ', 'aggregation_tmp_39612613');
CREATE DATABASE IF NOT EXISTS spider_tmp_shard;
USE spider_tmp_shard;
CREATE TABLE spider_tmp_shard.aggregation_tmp_13972201 ENGINE=MyISAM SELECT log10( `num` ) AS `_log10_num_`,`snapnum` AS `snapnum` FROM ( SELECT SUM(`num`) AS `num`,`snapnum` FROM `aggregation_tmp_39612613`  GROUP BY `snapnum`  ) AS `s`  LIMIT 0;
USE spider_tmp_shard;
INSERT INTO spider_tmp_shard.aggregation_tmp_13972201 SELECT log10( `num` ) AS `_log10_num_`,`snapnum` AS `snapnum` FROM ( SELECT SUM(`num`) AS `num`,`snapnum` FROM `aggregation_tmp_39612613`  GROUP BY `snapnum`  ) AS `s` ;
CALL paquLinkTmp('aggregation_tmp_13972201');
CALL paquDropTmp('aggregation_tmp_39612613');
USE spider_tmp_shard;
SET @i=0;
CREATE TABLE cosmosim_user_kristin.`2015-06-04-12-29-29-1140` ENGINE=MyISAM SELECT @i:=@i+1 AS `row_id`,  `_log10_num_`,`snapnum` FROM `aggregation_tmp_13972201`  ORDER BY `snapnum` ASC ;
CALL paquDropTmp('aggregation_tmp_13972201')

Note: The query above as it is does not make that much sense. What I actually intended was to get the mass function for logarithmic mass bins and additionally with logarithmic counts for quick and easy plotting. But there are for sure some other cases where functions of functions will occur.