cityofaustin / atd-data-tech

Austin Transportation Data & Technology Services
17 stars 2 forks source link

Develop Socrata Query for Analyzing the Gridsmart Data #5520

Closed atdservicebot closed 3 years ago

atdservicebot commented 3 years ago

What application are you using?

Other / Not Sure

Describe the problem.

Currently we are using Power BI (with Surbhi's help) to analyze the Gridsmart data in Socrata via an OData connection. The Gridsmart data is binned in 15 min categories and despite our best efforts in PowerQuery to limit our data size, it is large enough that automated refreshes fail.

Can you help us develop a Socrata Query (suggested by John Clary) to help us limit the data that Power BI is having to import to just the data we require? Below is how we'd like to transform the data.

-can we change it to being binned by each hour instead of 15 mins? -remove all data prior to Jan 1, 2020 -Filter to only see the ATD Device IDs shown in the attachment?

Reach out to myself or Allyson Richey with questions

How soon do you need this?

Flexible — An extended timeline is OK

Is there anything else we should know?

Discuss feasibility in the next week or so and then talk through a plan. We are getting by now, but hopefully this shouldn't be a heavy lift.

Requested By Lance B.

Attachment (51.5kb)

Request ID: DTS21-101745

johnclary commented 3 years ago

Here's the GRIDSMART open dataset.

It's not possible to use SQL queries against the OData endpoint, but we should be able to use the native Socrata API $query to write a query which aggregates this data into 1-hour bins.

@SurbhiBakshi did some preliminary research to understand PowerBI's capability for consuming Socrata's native JSON endpoint vs the O-Data endpoint (what the MMC is currently using). We should be able to walk Lance through this transition.

Two tasks at hand:

johnclary commented 3 years ago

~@mateoclarke @amenity flagging this for the next sprint planning~

Nevermind—going to ask UTCTR to take a crack at it.

johnclary commented 3 years ago

I asked Ken @ CTR if he could work on this. I believe we're going to have a meeting to discuss. Lance is in the loop.

SurbhiBakshi commented 3 years ago

@johnclary - yes will continue looking into this issue, I think documenting what I find out will help future reporting needs.

johnclary commented 3 years ago

@SurbhiBakshi here's a query you can use which should be fairly close to what we settle on:

https://data.austintexas.gov/resource/sh59-i6y9.json?$query=select atd_device_id, intersection_name, direction, volume, date_trunc_ymd(read_date) as date, date_extract_hh(read_date) as hour where atd_device_id = '6382' and date_extract_y(read_date) > 2019 |> select atd_device_id, intersection_name, direction, date, sum(volume) as volume, hour group by atd_device_id, intersection_name, direction, date, hour limit 999999999999

Email to AMD and CTR:

I was able to dig up an old wavetronix query and re-purpose it to bin the GRIDSMART data.

It needs a bit more work, but we should be able to knock this out. Lance/Allyson, I've attached a CSV of the output. 

This is only for one device, but you get the idea. Would you review CSV and let us know what changes or additional columns you'll need? Once we have the query pinned down we can walk you through how to implement it in PowerBI.

The query, for reference:

https://data.austintexas.gov/resource/sh59-i6y9.json?$query=select atd_device_id, intersection_name, direction, volume, date_trunc_ymd(read_date) as date, date_extract_hh(read_date) as hour where atd_device_id = '6382' and date_extract_y(read_date) > 2019 |> select atd_device_id, intersection_name, direction, date, sum(volume) as volume, hour group by atd_device_id, intersection_name, direction, date, hour limit 999999999999
SurbhiBakshi commented 3 years ago

@johnclary - the query worked and did not need any tweaking in Power BI.

SurbhiBakshi commented 3 years ago

Here is the Power BI report. I was able to use the query without any adjustments, and set it to refresh a few times over the day. The data is getting refreshed.

johnclary commented 3 years ago

Thanks @SurbhiBakshi. Email sent to MMC below.

I've updated the query to include movement, day of week, and include only the device IDs of interest. This works out to ~1m rows of data.

Surbhi has done some testing in PowerBI to figure out how to ingest the data, because you must connect to it with a generic web connection instead of the OData connection. Surbhi—can you walk Lance and Allyson through this process? We can setup a meeting if needed, but it sounded fairly straightforward.

Hopefully this works smoothy. If it doesn't, we can explore Ken's suggestion to create a Socrata dataset "view" of the aggregated values.

Here's the updated query:

https://data.austintexas.gov/resource/sh59-i6y9.json?$query=select atd_device_id, intersection_name, direction, movement, volume, date_trunc_ymd(read_date) as date, date_extract_hh(read_date) as hour, date_extract_dow(read_date) as dow where atd_device_id in ('6170', '6177', '6211', '6351', '6382', '6736', '6881', '6882', '7014', '7015', '7038') and date_extract_y(read_date) > 2019 |> select atd_device_id, intersection_name, direction, movement, date,dow, sum(volume) as volume, hour group by atd_device_id, intersection_name, direction, movement, date, hour, dow limit 999999999999999

SurbhiBakshi commented 3 years ago

I documented the process to connect to an Open Data Portal dataset using the Web Connector in GitBook and emailed the link to the MMC.

Email to the MMC - Here is some documentation on connecting to the dataset using the Web Connector. Let me know if you have any questions.

johnclary commented 3 years ago

@SurbhiBakshi thanks for your help with this. Closing.