Metropolitan-Council / tc.sensors

Package with functions to pull sensor data, sensor IDs, and sensor configuration for MnDOT metro district
https://metropolitan-council.github.io/tc.sensors
Other
1 stars 0 forks source link

Overall package vision #9

Open eroten opened 4 years ago

eroten commented 4 years ago

Access and clean raw data

The fundamental purpose of this package is to access loop detector from MnDOT's JSON feed. The data can be somewhat "dirty", and the package will include functions for finding nulls and interpolating values, flagging impossible values, and formatting column names and classes.

Data storage

There are pros and cons to putting the cleaned sensor data into a database.

Aggregate

The raw data is provided in 30 second intervals. Common temporal aggregations include 10, 15, and 30 minutes, 1 hour, morning and evening peak periods, and 24 hours.

The raw data is accessed for an individual sensor. Sensors can be aggregated up to nodes/stations, corridors, lanes (?). We need functionality for aggregating nodes, stations, and corridors up to polylines.

Calculate

Aggregated data can be used to calculate various measures.

General practices

ashleyasmus commented 3 years ago

I just got off the phone with Tim Johnson (MNIT), a software developer working with the MnDOT loop detector data. It was a really nice call - he's super kind and easy to talk to, and totally on board with what we want to do with the data. I called him because I was talking with him about the server issues, and he said that we should chat if I had thoughts about things they could do on the server side (aggregations and transformations of the data) to make our work easier.

He said several things that were really promising. One was that the work we were doing with the traffic data to download, aggregate, transform and load it into our own database was work that was also being duplicated by other groups (academia, gov't) and work they saw as more ideally performed closer to the server side to keep things standardized . I mentioned issues with making sure the way we identified data/sensors as trustworthy, and tracking changes to that "field_length" (vehicle length) attribute over time. He completely agreed and said that they were having similar discussions in his own group.

Another thing is that his team is all about open-source software development. Currently the traffic data server is written in a language called RUST -- I'm not familiar with it at all, and he said not to worry, that we could perhaps submit issues or ideas of things we might want built on the server side, and they could do it.

He also talked about how there are internal discussions about whether they should start to store the data in a formal database, especially if we were going to create derived fields. I said ideally they would have a database that I could just query out the data I needed, at the spatial and temporal aggregation that made sense for me. He agreed that this work needed to be done and that he'd like to involve us more on brainstorming what exactly that would look like.