Closed gangliao closed 3 years ago
CrimeDate | CrimeTime | CrimeCode | Location | Description | Inside/Outside | Weapon | Post | District | Neighborhood | Longitude | Latitude | Location 1 | Premise | vri_name1 | Total Incidents
We can run a CQ after every hour
Typically e-commerce datasets are proprietary and consequently hard to find among publicly available data. However, The UCI Machine Learning Repository has made this dataset containing actual transactions from 2010 and 2011. The dataset is maintained on their site, where it can be found by the title "Online Retail".
CQ Example:
A customer uses a SQL to compute a 1-minute, sliding-window sum of items sold in online shopping transactions captured in the stream.
group by
select stream productId,
floor(rowtime to hour) as rowtime,
sum(units) as u,
count(*) as c
from Orders
group by productId,
floor(rowtime to hour)
The "pie chart" problem:
select productId, count(*)
from Orders
where rowtime > current_timestamp - interval ‘1’ hour
group by productId
select stream *
from Orders
where units > 1000
join streams if the join condition forces them into “lock step”, within a window (in this case, 1 hour).
select stream *
from Orders as o
join Shipments as s
on o.productId = p.productId
and s.rowtime
between o.rowtime
and o.rowtime + interval ‘1’ hour
CQ Example: Streaming ETL
A customer uses CQ to continuously transform and deliver log to the object storage. The log data is transformed using several operators including applying a schema to the different log events, partitioning data by event type, sorting data by timestamp, and buffering data for one hour prior to delivery. The application has many transformation steps but none are computationally intensive.
US Accidents (3.5 million records) [Link]
The customer is applying a continuous filter to only retain records of interest.