adrpar / paqu

A parallel query engine for MySQL + Spider Engine built on a fork of shard-query
GNU General Public License v2.0
10 stars 0 forks source link

Error when trying to join tables #28

Open surfcast23 opened 9 years ago

surfcast23 commented 9 years ago

When I try to get distinct columns from the following tables, Redshifts, FOF, FOFMtree I get this error

Unknown column 'r.r.' in 'field list'

Query as follows

SELECT DISTINCT mt.*, f.*, r.* FROM MDR1.FOFMtree mt, MDR1.FOF f, MDR1.Redshifts r WHERE mt.fofId = f.fofId AND r.snapnum = f.snapnum AND f.mass = mt.mass

QUERY PLAN

-- The query plan used to run this query: -------------------------------------------------- CALL paquExec('SELECT DISTINCTr.snapnumASrsnapnum,r.aexpASraexp,r.zredASrzredFROM MDR1.Redshifts ASr', 'aggregation_tmp_73042841')-- CALL paquExec('SELECTf.fofIdASffofId,f.snapnumASfsnapnum,f.levelASflevel,f.NInFileASfNInFile,f.xASfx,f.yASfy,f.zASfz,f.vxASfvx,f.vyASfvy,f.vzASfvz,f.npASfnp,f.massASfmass,f.sizeASfsize,f.dispASfdisp,f.disp_vASf__disp_v,f.deltaASfdelta,f.spinASfspin,f.angMom_xASfangMom_x,f.angMom_yASfangMom_y,f.angMom_zASf__angMom_z,f.angMomASfangMom,f.axis1ASfaxis1,f.axis2ASfaxis2,f.axis3ASfaxis3,f.axis1_xASfaxis1_x,f.axis1_yASfaxis1_y,f.axis1_zASfaxis1_z,f.axis2_xASfaxis2_x,f.axis2_yASfaxis2_y,f.axis2_zASfaxis2_z,f.axis3_xASfaxis3_x,f.axis3_yASfaxis3_y,f.axis3_zASf__axis3_z,f.ixASfix,f.iyASfiy,f.izASfiz,f.phkeyASfphkey,r.r.AS``,r.rsnapnumASrsnapnum,r.raexpASraexp,r.rzredASrzredFROM MDR1.FOF ASfJOIN ( SELECT DISTINCTrsnapnum,raexp,rzredFROMaggregation_tmp_73042841) ASrWHERE (r.rsnapnum=f.snapnum) ', 'aggregation_tmp_31621782')-- CALL paquExec('SELECTmt.fofTreeIdASmtfofTreeId,mt.fofIdASmtfofId,mt.treeSnapnumASmttreeSnapnum,mt.descendantIdASmtdescendantId,mt.lastProgIdASmtlastProgId,mt.mainLeafIdASmtmainLeafId,mt.treeRootIdASmttreeRootId,mt.xASmtx,mt.yASmty,mt.zASmtz,mt.vxASmtvx,mt.vyASmtvy,mt.vzASmtvz,mt.npASmtnp,mt.massASmtmass,mt.sizeASmtsize,mt.spinASmtspin,mt.ixASmtix,mt.iyASmtiy,mt.izASmtiz,mt.phkeyASmtphkey,f.ffofIdASffofId,f.fsnapnumASfsnapnum,f.flevelASflevel,f.fNInFileASfNInFile,f.fxASfx,f.fyASfy,f.fzASfz,f.fvxASfvx,f.fvyASfvy,f.fvzASfvz,f.fnpASfnp,f.fmassASfmass,f.fsizeASfsize,f.fdispASfdisp,f.fdisp_vASf__disp_v,f.fdeltaASfdelta,f.fspinASfspin,f.fangMom_xASfangMom_x,f.f__angMom_yASfangMom_y,f.fangMom_zASf__angMom_z,f.fangMomASfangMom,f.faxis1ASfaxis1,f.faxis2ASfaxis2,f.faxis3ASfaxis3,f.faxis1_xASfaxis1_x,f.f__axis1_yASfaxis1_y,f.faxis1_zASf__axis1_z,f.faxis2_xASfaxis2_x,f.f__axis2_yASfaxis2_y,f.faxis2_zASf__axis2_z,f.faxis3_xASfaxis3_x,f.f__axis3_yASfaxis3_y,f.faxis3_zASf__axis3_z,f.fixASfix,f.fiyASfiy,f.fizASfiz,f.fphkeyASfphkey,f.rsnapnumASrsnapnum,f.raexpASraexp,f.rzredASrzredFROM MDR1.FOFMtree ASmtJOIN ( SELECTffofId,fsnapnum,flevel,fNInFile,fx,fy,fz,fvx,fvy,fvz,fnp,fmass,fsize,fdisp,fdisp_v,fdelta,fspin,fangMom_x,f__angMom_y,fangMom_z,fangMom,faxis1,faxis2,faxis3,faxis1_x,f__axis1_y,faxis1_z,faxis2_x,f__axis2_y,faxis2_z,faxis3_x,f__axis3_y,faxis3_z,fix,fiy,fiz,fphkey,``,rsnapnum,raexp,rzredFROMaggregation_tmp_31621782) ASfWHERE (mt.fofId=f.ffofId) AND (f.fmass=mt.mass) ', 'aggregation_tmp_51077786')-- CALL paquDropTmp('aggregation_tmp_73042841')-- CALL paquDropTmp('aggregation_tmp_31621782')-- USE spider_tmp_shard-- SET @i=0-- CREATE TABLE cosmosim_user_surfcast23.Join1ENGINE=MyISAM SELECT @i:=@i+1 ASrow_id,mtfofTreeId,mtfofId,mttreeSnapnum,mtdescendantId,mtlastProgId,mtmainLeafId,mttreeRootId,mtx,mty,mtz,mtvx,mtvy,mtvz,mtnp,mtmass,mtsize,mtspin,mtix,mtiy,mtiz,mtphkey,ffofId,fsnapnum,flevel,fNInFile,fx,fy,fz,fvx,fvy,fvz,fnp,fmass,fsize,fdisp,fdisp_v,fdelta,fspin,fangMom_x,fangMom_y,f__angMom_z,fangMom,faxis1,faxis2,faxis3,faxis1_x,faxis1_y,f__axis1_z,faxis2_x,faxis2_y,f__axis2_z,faxis3_x,faxis3_y,f__axis3_z,fix,fiy,fiz,fphkey,rsnapnum,raexp,rzredFROMaggregation_tmp_51077786-- CALL paquDropTmp('aggregation_tmp_51077786')

kristinriebe commented 9 years ago

Without DISTINCT, the query plan looks perfectly fine.

Here's a shorter version of the problematic query that won't time out even if corrected:

SELECT DISTINCT f.size, mt.mass, r.zred FROM MDR1.FOFMtree mt, MDR1.FOF f, MDR1.Redshifts r WHERE mt.fofTreeId between 100000000 and 100000010 AND mt.fofId = f.fofId AND r.snapnum = f.snapnum

with the query plan (Problem lies in select-list of 2. paquExec-call):

-- CALL paquExec('SELECT DISTINCT `mt`.`mass` AS `mt.mass`,`mt`.`fofTreeId` AS `mt.fofTreeId`,`mt`.`fofId` AS `mt.fofId` FROM MDR1.FOFMtree AS `mt` WHERE ( `mt`.`fofTreeId` between 100000000 and 100000010 ) ', 'aggregation_tmp_16214685')
-- CALL paquExec('SELECT `f`.`size` AS `f.size`,`f`.`fofId` AS `f.fofId`,`f`.`snapnum` AS `f.snapnum`,`mt`.`mt.` AS ``,`mt`.`mt.mass` AS `mt.mass` FROM MDR1.FOF AS `f` JOIN ( SELECT DISTINCT `mt.mass`,`mt.fofTreeId`,`mt.fofId` FROM `aggregation_tmp_16214685` ) AS `mt` WHERE ( `mt`.`mt.fofId` = `f`.`fofId` ) ', 'aggregation_tmp_47789533')
-- CALL paquExec('SELECT `r`.`zred` AS `r.zred`,`f`.`f.size` AS `f.size`,`f`.`mt.mass` AS `mt.mass` FROM MDR1.Redshifts AS `r` JOIN ( SELECT `f.size`,`f.fofId`,`f.snapnum`,``,`mt.mass` FROM `aggregation_tmp_47789533` ) AS `f` WHERE ( `r`.`snapnum` = `f`.`f.snapnum` ) ', 'aggregation_tmp_43368652')
-- CALL paquDropTmp('aggregation_tmp_16214685')
-- CALL paquDropTmp('aggregation_tmp_47789533')
-- USE spider_tmp_shard
-- SET @i=0
-- CREATE TABLE cosmosim_user_kristin.`2014-11-06-09-37-01-6478` ENGINE=MyISAM SELECT @i:=@i+1 AS `row_id`, `r.zred`,`f.size`,`mt.mass` FROM `aggregation_tmp_43368652` 
-- CALL paquDropTmp('aggregation_tmp_43368652')
surfcast23 commented 9 years ago

HI Kristin,

I still get this error when running the above query

Unknown column 'mt.mt.' in 'field list'

I think its happens where I have marked it in bold below.

Thank you

Khary

On Thu, Nov 6, 2014 at 3:47 AM, Kristin Riebe notifications@github.com wrote:

Without DISTINCT, the query plan looks perfectly fine.

Here's a shorter version of the problematic query that won't time out even if corrected:

SELECT DISTINCT f.size, mt.mass, r.zred FROM MDR1.FOFMtree mt, MDR1.FOF f, MDR1.Redshifts r WHERE mt.fofTreeId between 100000000 and 100000010 AND mt.fofId = f.fofId AND r.snapnum = f.snapnum

with the query plan (Problem lies in select-list of 2. paquExec-call):

-- CALL paquExec('SELECT DISTINCT mt.mass AS mt.mass,mt.fofTreeId AS mt.fofTreeId,mt.fofId AS mt.fofId FROM MDR1.FOFMtree AS mt WHERE ( mt.fofTreeId between 100000000 and 100000010 ) ', 'aggregation_tmp_16214685') -- CALL paquExec('SELECT f.size AS f.size,f.fofId AS f.fofId,f.snapnum AS f.snapnum,mt.mt. AS ,mt.mt.massASmt.massFROM MDR1.FOF ASfJOIN ( SELECT DISTINCTmt.mass,mt.fofTreeId,mt.fofIdFROMaggregation_tmp_16214685) ASmtWHERE (mt.mt.fofId=f.fofId) ', 'aggregation_tmp_47789533') -- CALL paquExec('SELECTr.zredASr.zred,f.f.sizeASf.size,f.mt.massASmt.massFROM MDR1.Redshifts ASrJOIN ( SELECTf.size,f.fofId,f.snapnum,,mt.mass FROM aggregation_tmp_47789533 ) AS f WHERE ( r.snapnum = f.f.snapnum ) ', 'aggregation_tmp_43368652') -- CALL paquDropTmp('aggregation_tmp_16214685') -- CALL paquDropTmp('aggregation_tmp_47789533') -- USE spider_tmp_shard -- SET @i=0 -- CREATE TABLE cosmosim_user_kristin.2014-11-06-09-37-01-6478 ENGINE=MyISAM SELECT @i:=@i+1 AS row_id, r.zred,f.size,mt.mass FROM aggregation_tmp_43368652 -- CALL paquDropTmp('aggregation_tmp_43368652')

— Reply to this email directly or view it on GitHub https://github.com/adrpar/paqu/issues/28#issuecomment-61943519.

StriperCoast SurfCasters Club

kristinriebe commented 9 years ago

Hi Khary, well, yes. My example causes the same problem as yours. It's just a shorter version, thus better suited for testing. If my query is fixed, yours will be fixed as well.

By the way, for anyone stumbling across this here: the original query was intended to match snapshot numbers (snapnum) and redshifts for progenitors of a halo. This can be done easier (and without causing headaches for PaQu) for one example halo like this:

SELECT mt.*,z.zred FROM MDR1.FOFMtree AS mt, 
(SELECT DISTINCT treeSnapnum, zred FROM MDR1.TreeSnapnums) AS z 
WHERE 
    mt.fofTreeId BETWEEN 100000000 AND 
        (SELECT lastProgId FROM MDR1.FOFMtree WHERE fofTreeId = 100000000) 
    AND mt.treeSnapnum = z.treeSnapnum AND mt.np>1000 
ORDER BY mt.fofTreeId