Open ian-r-rose opened 8 months ago
@kengodleskidot and @ZhenyuZhu-Caltrans would you mind updating this issue with what you learned about the CHP incident pipeline within Caltrans and how it will be loaded to the Snowflake warehouse?
@pingpingxiu-DOT-ca-gov has developed a pipeline to bring the CHP data tables from the Caltrans TIM database. Discussion of how to load it into Snowflake would be beneficial in our next sprint session or we can have a separate meeting.
@pingpingxiu-DOT-ca-gov has developed a pipeline to bring the CHP data tables from the Caltrans TIM database. Discussion of how to load it into Snowflake would be beneficial in our next sprint session or we can have a separate meeting.
Agreed, let's discuss tomorrow morning.
Depending upon the volume of these table (which I anticipate will be much smaller than VDS), we may prefer to load them directly to Snowflake using Caltrans' airflow server.
@ian-r-rose There are 4 tables in PeMS that have CHP related data. I have included a document with additional details about those database tables. Samples seem to be received every 5-6 minutes based on the PeMS Real Time Data collection table. Let me know if you need anything else. PeMS-CHPTableSizeInfo_04192024.xlsx
Pingping will take a look at this issue once #195 and #115 have been completed.
CHP incidents are one input to helping determine whether PeMS VDS data are reliable or operational. There are existing pipelines that load data from the incidents system, though it's not clear whether they are based on web scraping or based on some internal access to CHP systems.
We should figure out: