confluentinc / ksql

The database purpose-built for stream processing applications.
https://ksqldb.io
Other
124 stars 1.04k forks source link

stream definitions should allow column referencing from the select list #5746

Open markteehan opened 4 years ago

markteehan commented 4 years ago

CREATE STREAM GDE_A021_STR WITH (kafka_topic='GDE_A021_STR', value_format='avro',partitions=1,replicas=1) AS SELECT TS , EVENTID , EVENT_DATE , MONTHYEAR , YEAR , FRACTIONDATE , ARRAY [ STRUCT (CODE:=ACTOR1CODE, NAME:=ACTOR1NAME, COUNTRYCODE:=ACTOR1COUNTRYCODE, KNOWNGROUPCODE:=ACTOR1KNOWNGROUPCODE, ETHNICCODE:=ACTOR1ETHNICCODE, RELIGION1CODE:=ACTOR1RELIGION1CODE, RELIGION2CODE:=ACTOR1RELIGION2CODE, TYPE1CODE:=ACTOR1TYPE1CODE, TYPE2CODE:=ACTOR1TYPE2CODE, TYPE3CODE:=ACTOR1TYPE3CODE) , STRUCT (CODE:=ACTOR2CODE, NAME:=ACTOR2NAME, COUNTRYCODE:=ACTOR2COUNTRYCODE, KNOWNGROUPCODE:=ACTOR2KNOWNGROUPCODE, ETHNICCODE:=ACTOR2ETHNICCODE, RELIGION1CODE:=ACTOR2RELIGION1CODE, RELIGION2CODE:=ACTOR2RELIGION2CODE, TYPE1CODE:=ACTOR2TYPE1CODE, TYPE2CODE:=ACTOR2TYPE2CODE, TYPE3CODE:=ACTOR2TYPE3CODE) ] AS ACTOR_STRUCT , ISROOTEVENT , EVENTCODE , EVENTBASECODE , EVENTROOTCODE , QUADCLASS , GOLDSTEINSCALE , NUMMENTIONS , NUMSOURCES , NUMARTICLES , AVGTONE , ARRAY [ STRUCT (TYPE:=ACTOR1GEO_TYPE, FULLNAME:=ACTOR1GEO_FULLNAME, COUNTRYCODE:=ACTOR1GEO_COUNTRYCODE, ADM1CODE:=ACTOR1GEO_ADM1CODE, ADM2CODE:=ACTOR1GEO_ADM2CODE, LAT:=ACTOR1GEO_LAT, LONG:=ACTOR1GEO_LONG, FEATUREID:=ACTOR1GEO_FEATUREID) , STRUCT (TYPE:=ACTOR2GEO_TYPE, FULLNAME:=ACTOR2GEO_FULLNAME, COUNTRYCODE:=ACTOR2GEO_COUNTRYCODE, ADM1CODE:=ACTOR2GEO_ADM1CODE, ADM2CODE:=ACTOR2GEO_ADM2CODE, LAT:=ACTOR2GEO_LAT, LONG:=ACTOR2GEO_LONG, FEATUREID:=ACTOR2GEO_FEATUREID) , STRUCT (TYPE:=ACTIONGEO_TYPE, FULLNAME:=ACTIONGEO_FULLNAME, COUNTRYCODE:=ACTIONGEO_COUNTRYCODE, ADM1CODE:=ACTIONGEO_ADM1CODE, ADM2CODE:=ACTIONGEO_ADM2CODE, LAT:=ACTIONGEO_LAT, LONG:=ACTIONGEO_LONG, FEATUREID:=ACTIONGEO_FEATUREID) ] AS GEOTYPE_STRUCT , DATEADDED , SOURCEURL , URL_EXTRACT_HOST(SOURCEURL) as SITE , URL_EXTRACT_PATH(SOURCEURL) as URLPATH FROM GDE_A020_STR PARTITION BY SITE;

Line: 32, Col: 14: PARTITION BY column 'SITE' cannot be resolved.

SITE is a derived column from "URL_EXTRACT_HOST(SOURCEURL) as SITE" It should be possible to forward-reference this column in the PARTITION BY clause. This type of forward column-referencing functionality works in some/most/all databases.

agavra commented 4 years ago

Hello @markteehan - PARTITION BY (analogously to GROUP BY) works on the source data, not the sink data; that's why the forward referencing doesn't apply. See #2701 and #5487 though I'm beginning to question whether or not we should revert this behavior as it seems to be unintuitive (cc @MichaelDrogalis ).