ISG-ICS / cloudberry

Big Data Visualization
http://cloudberry.ics.uci.edu
90 stars 82 forks source link

AQLQueryGenerator ignores the unnested variable when there is no group by and select #255

Closed luochen01 closed 7 years ago

luochen01 commented 7 years ago

For a cloudberry query having a unnest statement but no group by and select statements, the AQL query should output a flatted record which includes the unnested variable. However, the current AQL generator simply ignores the unnested variable.

For example, consider the following query.

{
  "dataset": "twitter.ds_tweet",
   "unnest" : [{ "hashtags": "tag"}],
    "select" : {
    "order" : [ "-count"],
    "limit": 10,
    "offset" : 0
  }
}

The execution of this query should produce some records like:

{  
   "create_at":"2015-11-23T16:14:04.000Z",
   "id":668945642149036032,
   "text":"#ledyardffa members at the IMAGE conference at Aqua Turf Club. @ Aqua Turf Club https://t.co/qTfur8kdEG",
   "in_reply_to_status":-1,
   "in_reply_to_user":-1,
   "favorite_count":0,
   "coordinate":[  
      -72.87427231,
      41.57279355
   ],
   "retweet_count":0,
   "lang":"en",
   "is_retweet":false,
   "hashtags":[  
      "ledyardffa"
   ],
   "user":{  
      "id":801273728,
      "name":"Bob Williams",
      "screen_name":"rwilliamslhs",
      "lang":"en",
      "location":"Connecticut, USA",
      "create_at":"2012-09-03",
      "description":"Ledyard HS Ag-Sci Teacher\r\nFormerly Habitat for Humanity International partner in the Philippines and long-time resident of General Santos City",
      "followers_count":11,
      "friends_count":6,
      "statues_count":6
   },
   "place":{  
      "country":"United States",
      "country_code":"United States",
      "full_name":"Southington, CT",
      "id":"000086513b2042b6",
      "name":"Southington",
      "place_type":"city",
      "bounding_box":[  
         [  
            -72.944738,
            41.544766
         ],
         [  
            -72.818407,
            41.653245
         ]
      ]
   },
   "geo_tag":{  
      "stateID":9,
      "stateName":"Connecticut",
      "countyID":9009,
      "countyName":"New Haven"
   },
   "tag": "ledyardffa"
}

However, the current AQL query would generate the following query:

for $t in dataset twitter.ds_tweet
where not(is-null($t.'hashtags'))
for $unnest0 in $t.'hashtags'
limit 10
offset 0
return
$t

And the query would produce some record like:

{  
   "create_at":"2015-11-23T16:14:04.000Z",
   "id":668945642149036032,
   "text":"#ledyardffa members at the IMAGE conference at Aqua Turf Club. @ Aqua Turf Club https://t.co/qTfur8kdEG",
   "in_reply_to_status":-1,
   "in_reply_to_user":-1,
   "favorite_count":0,
   "coordinate":[  
      -72.87427231,
      41.57279355
   ],
   "retweet_count":0,
   "lang":"en",
   "is_retweet":false,
   "hashtags":[  
      "ledyardffa"
   ],
   "user":{  
      "id":801273728,
      "name":"Bob Williams",
      "screen_name":"rwilliamslhs",
      "lang":"en",
      "location":"Connecticut, USA",
      "create_at":"2012-09-03",
      "description":"Ledyard HS Ag-Sci Teacher\r\nFormerly Habitat for Humanity International partner in the Philippines and long-time resident of General Santos City",
      "followers_count":11,
      "friends_count":6,
      "statues_count":6
   },
   "place":{  
      "country":"United States",
      "country_code":"United States",
      "full_name":"Southington, CT",
      "id":"000086513b2042b6",
      "name":"Southington",
      "place_type":"city",
      "bounding_box":[  
         [  
            -72.944738,
            41.544766
         ],
         [  
            -72.818407,
            41.653245
         ]
      ]
   },
   "geo_tag":{  
      "stateID":9,
      "stateName":"Connecticut",
      "countyID":9009,
      "countyName":"New Haven"
   }
}
chenlica commented 7 years ago

@luochen01 Can you format the output nicely using github syntax so that we can see the results easily?

luochen01 commented 7 years ago

Sure. The only difference is that the "tag" field is missing.

chenlica commented 7 years ago

Cool. I wonder there is an easy way to highlight the differences. If not, not a big deal.