cagov / caldata-mdsa-caltrans-pems

CalData's MDSA project with Caltrans on Performance Measurement System (PeMS) data
https://cagov.github.io/caldata-mdsa-caltrans-pems/
MIT License
2 stars 0 forks source link

Network Discovery #1

Closed jkarpen closed 9 months ago

jkarpen commented 10 months ago

@ian-r-rose will drive discovery around CalTrans network architecture, creating this issue to track that work..

10/13/23 - Next step: capturing some items from the notes doc here.

jkarpen commented 9 months ago

Meeting scheduled with Zhenyu Zhu for 10/2 @ 3:30 to start process of getting access to data.

ian-r-rose commented 9 months ago

We had our discovery call on the Caltrans network on 10/2:

I shared ODI's understanding of the network (which was more of a conceptual diagram):

image

This is broadly correct, but Caltrans was able to put together a more detailed network diagram, which includes some proposed pieces which do not exist yet:

image

The current ETL jobs within Caltrans rely on a not-very-well-documented "data intake layer", with "lots and lots of perl scripts". Caltrans staff are currently trying to get a better understanding of how data flows through the intake layer, today it is a bit opaque.

The Caltrans proposal is to forward the "Raw" 30-second data from the "data intake layer" through a "Data relay server", and then on to an ODI S3 bucket. This would happen in parallel to the existing data pipelines. All servers would be owned and managed by Caltrans.

A basic sequence of events would be:

ian-r-rose commented 9 months ago

Closing as network discovery is complete (at least for the purposes of this meeting). More specific issues will be opened for follow-up work.