forcedotcom / phoenix

BSD 3-Clause "New" or "Revised" License
558 stars 227 forks source link

Support derived tables #5

Open jtaylor-sfdc opened 11 years ago

jtaylor-sfdc commented 11 years ago

Add support for derived queries of the form: SELECT * FROM ( SELECT company, revenue FROM Company ORDER BY revenue) LIMIT 10

Adding support for this requires a compile time change as well as a runtime execution change. The first version of the compile-time change could limit aggregation to only be allowed in the inner or the outer query, but not both. In this case, the inner and outer queries can be combined into a single query with the outer select becoming just a remapping of a subset of the projection from the inner select. The second version of the compile-time change could handle aggregation in the inner and outer select by performing client side (this is likely a less common scenario).

For the runtime execution, change the UngroupedAggregateRegionObserver would be modified to look for a new "TopNLimit" attribute with an int value in the Scan. This would control the maximum number of values for the coprocessor to hold on to as the scan is performed. Then the GroupedAggregatingResultIterator would be modified to handle keeping the topN values received back from all the child iterators.

jhanit commented 10 years ago

Hi James,

As and when usage of Phoenix increase, the demand for Nested queries will shoot up. Nested queries is an important part of any real time project(if they want to use SQL feature and hence phoenix).

Can you please consider this in your subsequent releases? For my product , i need to use nested queries heavily and hence i turned toward phoenix as i have to write several lines of code if i use Hbase directly. But unfortunately this is not supported yet in Phoenix.

Regards, Nitin

jtaylor-sfdc commented 10 years ago

Thanks for the feedback, @jhanit. Would you be able to provide some sample queries for your nested query use cases, as this would ensure it'll meet your needs? Often times, with the lack of derived tables, a "poor man's" workaround is to perform the inner query using UPSERT SELECT into a temp table, and then referencing the temp table in the outer query. Would this work for you in the interim?

jhanit commented 10 years ago

Hi James,

Please find the sample nested queries use case along with SQL queries which i have tested with sql. My Table(In Phoenix, similar version is in SQL ):- CREATE TABLE IF NOT EXISTS EVENT_MANAGEMENT( eventId BIGINT NOT NULL, eventData.eventTime DATE, eventData.eventType VARCHAR, eventData.eventDescription VARCHAR, eventData.entity VARCHAR , eventData.facility VARCHAR CONSTRAINT pk PRIMARY KEY (eventId));

col1,col2,col3,col4 are dynamic(for SQl i just added while creating table).

Requirement:- Pull all the unique set of (entity,facility) with recent values of all the attributes stored in col1,col2,col3,col4...

My SQL querry:-

1st Approach

select Ent.entity,Ent.Facility, Att.Col1,Att.Col2,Att.Col3,Att.Col4

from

(select distinct entity,facility from event_mgmt

where entity is not null and facility is not null) Ent

join

(select * from

(select entity, col1, col2 from (

select entity, col1, col2, ROW_NUMBER() OVER ( partition by entity order by eventTime desc) rn

from event_mgmt

where entity is not null) A

where rn = 1) A,

(select facility, col3, col4 from (

select facility, col3, col4, ROW_NUMBER() OVER ( partition by facility order by eventTime desc) rn

from event_mgmt

where facility is not null) A

where rn = 1) B) Att

on Ent.Entity = Att.Entity and Ent.Facility = Att.Facility

2nd Approach

Trying to use temp table by converting the above query into more update(which will translate into upsert select). But i was unable to get rid off join some how.

--get prelim data in temp table

select distinct entity,facility, cast (null as int) as col1, cast (null as varchar) as col2, cast (null as int) as col3, cast (null as varchar) as col4 into ##TempTable

from event_mgmt

where entity is not null and facility is not null

--update the Entity columns col1

update A

set Col1 = B.Col1

from ##TempTable A

join (select entity, col1 from (

select entity, col1, ROW_NUMBER() OVER ( partition by entity order by eventTime desc) rn

from event_mgmt

where entity is not null and col1 is not null) A

where rn = 1) B

on A.Entity = B.Entity

--update the Entity columns col2

update A

set Col2 = B.Col2

from ##TempTable A

join (select entity, col2 from (

select entity, col2, ROW_NUMBER() OVER ( partition by entity order by eventTime desc) rn

from event_mgmt

where entity is not null and col2 is not null) A

where rn = 1) B

on A.Entity = B.Entity

--update the facility columns col3

update A

set Col3 = B.Col3

from ##TempTable A

join (select facility, col3 from (

select facility, col3, ROW_NUMBER() OVER ( partition by facility order by eventTime desc) rn

from event_mgmt

where facility is not null and col3 is not null) A

where rn = 1) B

on A.facility = B.facility

--update the facility columns col4

update A

set Col4 = B.Col4

from ##TempTable A

join (select facility, col4 from (

select facility, col4, ROW_NUMBER() OVER ( partition by facility order by eventTime desc) rn

from event_mgmt

where facility is not null and col4 is not null) A

where rn = 1) B

on A.facility = B.facility

--select results from the temp table

select * from ##TempTable

--drop the temp table

drop table ##TempTable

This is use case 1. And i have several like this which i will start working on.

Is it possible for a workaround for now(I tried but somehow unable to get rid off Join), as this will decide my phoenix usage extensively?

Any help or timelines to get these features in phoenix will be helpful for me.

jtaylor-sfdc commented 10 years ago

Thanks, @jhanit for the detail - very helpful. We have support for joins in our master branch. Can you let us know if this together with upsert/select will meet your need in the short term?

jhanit commented 10 years ago

I am trying with join. i was just trying to run the below mentioned query(just to test): SELECT item.item_id, item.name, supp.supplier_id, supp.name FROM JOIN_ITEM_TABLE item INNER JOIN JOIN_SUPPLIER_TABLE supp ON item.supplier_id = supp.supplier_id;

Error:-Syntax error. Missing "EOF" at line 1, column 87.

1) Can we run join query directly from command prompt or squirrel? Or it is only meant for running these queries from inside the code base?

jtaylor-sfdc commented 10 years ago

Are you using the master branch?

On Nov 18, 2013, at 6:41 AM, Nitin Kumar notifications@github.com wrote:

I am trying with join. i was just trying to run the below mentioned query(just to test): SELECT item.item_id, item.name, supp.supplier_id, supp.name FROM JOIN_ITEM_TABLE item INNER JOIN JOIN_SUPPLIER_TABLE supp ON item.supplier_id = supp.supplier_id;

Error:-Syntax error. Missing "EOF" at line 1, column 87.

1) Can we run join query directly from command prompt or squirrel? Or it is only meant for running these queries from inside the code base?

— Reply to this email directly or view it on GitHubhttps://github.com/forcedotcom/phoenix/issues/5#issuecomment-28702566 .

jhanit commented 10 years ago

Hi,

I hit a road block with my approach. As per our earlier discussion (how to proceed with join and nested query), I was trying to use Upsert select and Join (with master stream) and some how started to achieve my use case posted above(Initially).

Issue:- Upsert select didn't work with join . :-( UPSERT INTO ATTRIBUTE(ENTITY,ATTRIBUTE) select EM.entity,EM.EVCOL_NO_OF_PARTS from EVENT_MANAGEMENT EM(EVCOL_NO_OF_PARTS INTEGER,EVCOL_STANDARD_COST FLOAT,EVCOL_INVENTORY INTEGER,EVCOL_TYPE VARCHAR) INNER JOIN TEMPENTITY MA ON EM.EVENTTIME = MA.EVENTTIME AND EM.ENTITY = MA.ENTITY;

Error: Joins not supported (state=,code=0)

Where Join worked with select and UPSERT select worked in other cases.To my disappointment when i combine both it didn't work. Is there any workaround for it? or am I doing anything wrong?

I will be really happy if 'YES' is an answer for any of the above question.Eagerly waiting for reply :-)

Thanks, Nitin

JamesRTaylor commented 10 years ago

FYI, for those interested in this feature, it will be in our 3.1/4.1 release and is available in our Apache Phoenix 3.0/4.0/master branches now. See https://issues.apache.org/jira/browse/PHOENIX-136 and https://issues.apache.org/jira/browse/PHOENIX-927