Closed danrademacher closed 4 years ago
Looks like dropping the slice from 100 to 50 does the trick:
09:35:03 PM - INFO - {"rows":[],"time":18.764,"fields":{},"total_rows":50}
09:35:03 PM - INFO - Insert chunk of up to 50 crash records
09:35:21 PM - INFO - {"rows":[],"time":18.328,"fields":{},"total_rows":50}
09:35:21 PM - INFO - Insert chunk of up to 50 crash records
09:35:40 PM - INFO - {"rows":[],"time":18.547,"fields":{},"total_rows":50}
09:35:40 PM - INFO - Insert chunk of up to 50 crash records
09:35:59 PM - INFO - {"rows":[],"time":18.263,"fields":{},"total_rows":50}
09:35:59 PM - INFO - Insert chunk of up to 50 crash records
09:36:18 PM - INFO - {"rows":[],"time":17.977,"fields":{},"total_rows":50}
09:36:18 PM - INFO - Insert chunk of up to 50 crash records
09:36:36 PM - INFO - {"rows":[],"time":18.19,"fields":{},"total_rows":50}
09:36:36 PM - INFO - Insert chunk of up to 50 crash records
09:36:55 PM - INFO - {"rows":[],"time":18.242,"fields":{},"total_rows":50}
09:36:55 PM - INFO - Insert chunk of up to 50 crash records
09:37:13 PM - INFO - {"rows":[],"time":18.012,"fields":{},"total_rows":50}
09:37:13 PM - INFO - Insert chunk of up to 50 crash records
09:37:32 PM - INFO - {"rows":[],"time":18.417,"fields":{},"total_rows":50}
09:37:32 PM - INFO - Insert chunk of up to 50 crash records
09:37:51 PM - INFO - {"rows":[],"time":18.282,"fields":{},"total_rows":50}
09:37:51 PM - INFO - Insert chunk of up to 50 crash records
09:38:09 PM - INFO - {"rows":[],"time":18.201,"fields":{},"total_rows":50}
09:38:09 PM - INFO - Insert chunk of up to 50 crash records
09:38:28 PM - INFO - {"rows":[],"time":18.287,"fields":{},"total_rows":50}
At 18 seconds per, the backlog should be cleared in ~102 minutes ((17000/50)*18)/60
Note that in Feb we had to cut the limit down to 100 to get this to work, https://github.com/GreenInfo-Network/nyc-crash-mapper-etl-script/issues/25, and now it's 50. Why?
11:17:21 PM - INFO - {"rows":[],"time":18.377,"fields":{},"total_rows":50}
11:17:21 PM - INFO - Insert chunk of up to 50 crash records
11:17:25 PM - INFO - {"rows":[],"time":3.073,"fields":{},"total_rows":4}
11:17:27 PM - INFO - {"rows":[],"time":2.439,"fields":{},"total_rows":1}
11:17:27 PM - INFO - Find SODA records updated/modified since 2020-06-03
11:17:31 PM - INFO - Got 0 SODA entries updated since 2020-06-03
11:17:31 PM - INFO - Done updating records
11:17:31 PM - INFO - update_intersections() series launching
11:17:31 PM - INFO - Intersections crashcount reset
11:17:31 PM - INFO - Intersections crashcount dated 2018-09-02T00:00:00Z
11:17:31 PM - INFO - CARTO Batch Job ID: 0afa4f61-c426-47fa-b388-3269131093ff
11:17:31 PM - INFO - update_places() series launching
11:17:31 PM - INFO - Cleanup update_borough()
11:17:31 PM - INFO - Cleanup update_city_council()
11:17:31 PM - INFO - Cleanup update_nypd_precinct()
11:17:31 PM - INFO - Cleanup update_community_board()
11:17:31 PM - INFO - Cleanup update_neighborhood()
11:17:31 PM - INFO - Cleanup update_assembly()
11:17:31 PM - INFO - Cleanup update_senate()
11:17:32 PM - INFO - CARTO Batch Job ID: f4fecd9e-0d77-4eeb-b7e2-c6ba2d859a40
11:17:32 PM - INFO - update_hasvehicle() series launching
11:17:32 PM - INFO - CARTO Batch Job ID: ba1de757-f4c9-44bb-824f-4d280c595388
11:17:32 PM - INFO - CARTO Batch Job ID: fe6c31da-c8da-4313-9645-3daafcbeeb7e
11:17:32 PM - INFO - update_analyzeindex()
11:17:33 PM - INFO - CARTO Batch Job ID: cec6ff36-6503-45ba-8c0e-aae69640cb81
11:17:33 PM - INFO - ALL DONE
We have no data for August:
Looking at
heroku-logs
, it appears we're having timeout issues with SQL queries again:And then looking in CARTO, it appears we have no crashes since Jul 7:
Over in Socrata, there are 16,014 crashes since then: https://data.cityofnewyork.us/Public-Safety/Cashes-Since-7-8-2020/3mj4-ptnv/edit
So something happened on Jul 7, and now we have an issue where we have too much backlog to easily manage...