GreenInfo-Network / nyc-crash-mapper-etl-script

Extract, Transform, and Load script for fetching new data from the NYC Open Data Portal's vehicle collision data and loading into the NYC Crash Mapper table on CARTO.
3 stars 0 forks source link

Data import failing, again #28

Closed danrademacher closed 4 years ago

danrademacher commented 4 years ago

We have no data for August: image

Looking at heroku-logs, it appears we're having timeout issues with SQL queries again:

2020-09-01T17:23:00.358575+00:00 app[scheduler.3180]: 05:23:00 PM - INFO - {"error":["You are over platform's limits: SQL query timeout error. Refactor your query before running again or contact CARTO support for more details."],"context":"limit","detail":"datasource"}
2020-09-01T17:23:00.358814+00:00 app[scheduler.3180]: 05:23:00 PM - INFO - Insert chunk of up to 100 crash records
2020-09-01T17:23:25.713549+00:00 app[scheduler.3180]: 05:23:25 PM - INFO - {"error":["You are over platform's limits: SQL query timeout error. Refactor your query before running again or contact CARTO support for more details."],"context":"limit","detail":"datasource"}
2020-09-01T17:23:25.713681+00:00 app[scheduler.3180]: 05:23:25 PM - INFO - Insert chunk of up to 100 crash records
2020-09-01T17:23:51.016831+00:00 app[scheduler.3180]: 05:23:51 PM - INFO - {"error":["You are over platform's limits: SQL query timeout error. Refactor your query before running again or contact CARTO support for more details."],"context":"limit","detail":"datasource"}
2020-09-01T17:23:51.017009+00:00 app[scheduler.3180]: 05:23:51 PM - INFO - Insert chunk of up to 100 crash records
2020-09-01T17:24:17.179047+00:00 app[scheduler.3180]: 05:24:17 PM - INFO - {"error":["You are over platform's limits: SQL query timeout error. Refactor your query before running again or contact CARTO support for more details."],"context":"limit","detail":"datasource"}
2020-09-01T17:24:17.179135+00:00 app[scheduler.3180]: 05:24:17 PM - INFO - Insert chunk of up to 100 crash records
2020-09-01T17:24:42.564719+00:00 app[scheduler.3180]: 05:24:42 PM - INFO - {"error":["You are over platform's limits: SQL query timeout error. Refactor your query before running again or contact CARTO support for more details."],"context":"limit","detail":"datasource"}
2020-09-01T17:24:42.564977+00:00 app[scheduler.3180]: 05:24:42 PM - INFO - Insert chunk of up to 100 crash records
2020-09-01T17:25:07.880869+00:00 app[scheduler.3180]: 05:25:07 PM - INFO - {"error":["You are over platform's limits: SQL query timeout error. Refactor your query before running again or contact CARTO support for more details."],"context":"limit","detail":"datasource"}
2020-09-01T17:25:07.881015+00:00 app[scheduler.3180]: 05:25:07 PM - INFO - Insert chunk of up to 100 crash records
2020-09-01T17:25:33.238895+00:00 app[scheduler.3180]: 05:25:33 PM - INFO - {"error":["You are over platform's limits: SQL query timeout error. Refactor your query before running again or contact CARTO support for more details."],"context":"limit","detail":"datasource"}
2020-09-01T17:25:33.239067+00:00 app[scheduler.3180]: 05:25:33 PM - INFO - Insert chunk of up to 100 crash records
2020-09-01T17:25:58.593667+00:00 app[scheduler.3180]: 05:25:58 PM - INFO - {"error":["You are over platform's limits: SQL query timeout error. Refactor your query before running again or contact CARTO support for more details."],"context":"limit","detail":"datasource"}
2020-09-01T17:25:58.595859+00:00 app[scheduler.3180]: 05:25:58 PM - INFO - Insert chunk of up to 100 crash records
2020-09-01T17:26:23.896296+00:00 app[scheduler.3180]: 05:26:23 PM - INFO - {"error":["You are over platform's limits: SQL query timeout error. Refactor your query before running again or contact CARTO support for more details."],"context":"limit","detail":"datasource"}
2020-09-01T17:26:23.896386+00:00 app[scheduler.3180]: 05:26:23 PM - INFO - Insert chunk of up to 100 crash records
2020-09-01T17:26:49.183942+00:00 app[scheduler.3180]: 05:26:49 PM - INFO - {"error":["You are over platform's limits: SQL query timeout error. Refactor your query before running again or contact CARTO support for more details."],"context":"limit","detail":"datasource"}
2020-09-01T17:26:49.184233+00:00 app[scheduler.3180]: 05:26:49 PM - INFO - Insert chunk of up to 100 crash records
2020-09-01T17:27:14.548640+00:00 app[scheduler.3180]: 05:27:14 PM - INFO - {"error":["You are over platform's limits: SQL query timeout error. Refactor your query before running again or contact CARTO support for more details."],"context":"limit","detail":"datasource"}
2020-09-01T17:27:14.548850+00:00 app[scheduler.3180]: 05:27:14 PM - INFO - Insert chunk of up to 100 crash records
2020-09-01T17:27:39.859301+00:00 app[scheduler.3180]: 05:27:39 PM - INFO - {"error":["You are over platform's limits: SQL query timeout error. Refactor your query before running again or contact CARTO support for more details."],"context":"limit","detail":"datasource"}
2020-09-01T17:27:39.859457+00:00 app[scheduler.3180]: 05:27:39 PM - INFO - Insert chunk of up to 100 crash records
2020-09-01T17:28:05.165582+00:00 app[scheduler.3180]: 05:28:05 PM - INFO - {"error":["You are over platform's limits: SQL query timeout error. Refactor your query before running again or contact CARTO support for more details."],"context":"limit","detail":"datasource"}
2020-09-01T17:28:05.165734+00:00 app[scheduler.3180]: 05:28:05 PM - INFO - Insert chunk of up to 100 crash records
2020-09-01T17:28:30.475579+00:00 app[scheduler.3180]: 05:28:30 PM - INFO - {"error":["You are over platform's limits: SQL query timeout error. Refactor your query before running again or contact CARTO support for more details."],"context":"limit","detail":"datasource"}
2020-09-01T17:28:30.475705+00:00 app[scheduler.3180]: 05:28:30 PM - INFO - Insert chunk of up to 100 crash records
2020-09-01T17:28:55.758266+00:00 app[scheduler.3180]: 05:28:55 PM - INFO - {"error":["You are over platform's limits: SQL query timeout error. Refactor your query before running again or contact CARTO support for more details."],"context":"limit","detail":"datasource"}
2020-09-01T17:28:55.759476+00:00 app[scheduler.3180]: 05:28:55 PM - INFO - Insert chunk of up to 100 crash records
2020-09-01T17:29:21.050903+00:00 app[scheduler.3180]: 05:29:21 PM - INFO - {"error":["You are over platform's limits: SQL query timeout error. Refactor your query before running again or contact CARTO support for more details."],"context":"limit","detail":"datasource"}
2020-09-01T17:29:21.051109+00:00 app[scheduler.3180]: 05:29:21 PM - INFO - Insert chunk of up to 100 crash records
2020-09-01T17:29:46.356416+00:00 app[scheduler.3180]: 05:29:46 PM - INFO - {"error":["You are over platform's limits: SQL query timeout error. Refactor your query before running again or contact CARTO support for more details."],"context":"limit","detail":"datasource"}
2020-09-01T17:29:46.357274+00:00 app[scheduler.3180]: 05:29:46 PM - INFO - Insert chunk of up to 100 crash records
2020-09-01T17:30:11.644257+00:00 app[scheduler.3180]: 05:30:11 PM - INFO - {"error":["You are over platform's limits: SQL query timeout error. Refactor your query before running again or contact CARTO support for more details."],"context":"limit","detail":"datasource"}
2020-09-01T17:30:11.644674+00:00 app[scheduler.3180]: 05:30:11 PM - INFO - Insert chunk of up to 100 crash records
2020-09-01T17:30:37.002447+00:00 app[scheduler.3180]: 05:30:37 PM - INFO - {"error":["You are over platform's limits: SQL query timeout error. Refactor your query before running again or contact CARTO support for more details."],"context":"limit","detail":"datasource"}
2020-09-01T17:30:37.002607+00:00 app[scheduler.3180]: 05:30:37 PM - INFO - Insert chunk of up to 100 crash records
2020-09-01T17:31:02.417405+00:00 app[scheduler.3180]: 05:31:02 PM - INFO - {"error":["You are over platform's limits: SQL query timeout error. Refactor your query before running again or contact CARTO support for more details."],"context":"limit","detail":"datasource"}
2020-09-01T17:31:02.417685+00:00 app[scheduler.3180]: 05:31:02 PM - INFO - Insert chunk of up to 100 crash records
2020-09-01T17:31:27.728228+00:00 app[scheduler.3180]: 05:31:27 PM - INFO - {"error":["You are over platform's limits: SQL query timeout error. Refactor your query before running again or contact CARTO support for more details."],"context":"limit","detail":"datasource"}
2020-09-01T17:31:27.731142+00:00 app[scheduler.3180]: 05:31:27 PM - INFO - Insert chunk of up to 100 crash records
2020-09-01T17:31:53.028471+00:00 app[scheduler.3180]: 05:31:53 PM - INFO - {"error":["You are over platform's limits: SQL query timeout error. Refactor your query before running again or contact CARTO support for more details."],"context":"limit","detail":"datasource"}
2020-09-01T17:31:53.029373+00:00 app[scheduler.3180]: 05:31:53 PM - INFO - Insert chunk of up to 100 crash records
2020-09-01T17:32:18.363413+00:00 app[scheduler.3180]: 05:32:18 PM - INFO - {"error":["You are over platform's limits: SQL query timeout error. Refactor your query before running again or contact CARTO support for more details."],"context":"limit","detail":"datasource"}
2020-09-01T17:32:18.367833+00:00 app[scheduler.3180]: 05:32:18 PM - INFO - Insert chunk of up to 100 crash records
2020-09-01T17:32:43.668687+00:00 app[scheduler.3180]: 05:32:43 PM - INFO - {"error":["You are over platform's limits: SQL query timeout error. Refactor your query before running again or contact CARTO support for more details."],"context":"limit","detail":"datasource"}
2020-09-01T17:32:43.669571+00:00 app[scheduler.3180]: 05:32:43 PM - INFO - Insert chunk of up to 100 crash records
2020-09-01T17:33:08.991976+00:00 app[scheduler.3180]: 05:33:08 PM - INFO - {"error":["You are over platform's limits: SQL query timeout error. Refactor your query before running again or contact CARTO support for more details."],"context":"limit","detail":"datasource"}
2020-09-01T17:33:08.992107+00:00 app[scheduler.3180]: 05:33:08 PM - INFO - Insert chunk of up to 100 crash records
2020-09-01T17:33:34.363651+00:00 app[scheduler.3180]: 05:33:34 PM - INFO - {"error":["You are over platform's limits: SQL query timeout error. Refactor your query before running again or contact CARTO support for more details."],"context":"limit","detail":"datasource"}
2020-09-01T17:33:34.363844+00:00 app[scheduler.3180]: 05:33:34 PM - INFO - Insert chunk of up to 100 crash records
2020-09-01T17:33:59.704032+00:00 app[scheduler.3180]: 05:33:59 PM - INFO - {"error":["You are over platform's limits: SQL query timeout error. Refactor your query before running again or contact CARTO support for more details."],"context":"limit","detail":"datasource"}
2020-09-01T17:33:59.704155+00:00 app[scheduler.3180]: 05:33:59 PM - INFO - Insert chunk of up to 100 crash records
2020-09-01T17:34:25.004493+00:00 app[scheduler.3180]: 05:34:25 PM - INFO - {"error":["You are over platform's limits: SQL query timeout error. Refactor your query before running again or contact CARTO support for more details."],"context":"limit","detail":"datasource"}
2020-09-01T17:34:25.004651+00:00 app[scheduler.3180]: 05:34:25 PM - INFO - Insert chunk of up to 100 crash records
2020-09-01T17:34:50.363677+00:00 app[scheduler.3180]: 05:34:50 PM - INFO - {"error":["You are over platform's limits: SQL query timeout error. Refactor your query before running again or contact CARTO support for more details."],"context":"limit","detail":"datasource"}
2020-09-01T17:34:50.363887+00:00 app[scheduler.3180]: 05:34:50 PM - INFO - Insert chunk of up to 100 crash records
2020-09-01T17:35:15.739153+00:00 app[scheduler.3180]: 05:35:15 PM - INFO - {"error":["You are over platform's limits: SQL query timeout error. Refactor your query before running again or contact CARTO support for more details."],"context":"limit","detail":"datasource"}
2020-09-01T17:35:15.739481+00:00 app[scheduler.3180]: 05:35:15 PM - INFO - Insert chunk of up to 100 crash records
2020-09-01T17:35:41.047129+00:00 app[scheduler.3180]: 05:35:41 PM - INFO - {"error":["You are over platform's limits: SQL query timeout error. Refactor your query before running again or contact CARTO support for more details."],"context":"limit","detail":"datasource"}
2020-09-01T17:35:41.047189+00:00 app[scheduler.3180]: 05:35:41 PM - INFO - Insert chunk of up to 100 crash records
2020-09-01T17:36:06.352593+00:00 app[scheduler.3180]: 05:36:06 PM - INFO - {"error":["You are over platform's limits: SQL query timeout error. Refactor your query before running again or contact CARTO support for more details."],"context":"limit","detail":"datasource"}
2020-09-01T17:36:06.352746+00:00 app[scheduler.3180]: 05:36:06 PM - INFO - Insert chunk of up to 100 crash records
2020-09-01T17:36:31.647629+00:00 app[scheduler.3180]: 05:36:31 PM - INFO - {"error":["You are over platform's limits: SQL query timeout error. Refactor your query before running again or contact CARTO support for more details."],"context":"limit","detail":"datasource"}
2020-09-01T17:36:31.647786+00:00 app[scheduler.3180]: 05:36:31 PM - INFO - Insert chunk of up to 100 crash records
2020-09-01T17:36:57.008922+00:00 app[scheduler.3180]: 05:36:57 PM - INFO - {"error":["You are over platform's limits: SQL query timeout error. Refactor your query before running again or contact CARTO support for more details."],"context":"limit","detail":"datasource"}
2020-09-01T17:36:57.009082+00:00 app[scheduler.3180]: 05:36:57 PM - INFO - Insert chunk of up to 100 crash records
2020-09-01T17:37:22.310811+00:00 app[scheduler.3180]: 05:37:22 PM - INFO - {"error":["You are over platform's limits: SQL query timeout error. Refactor your query before running again or contact CARTO support for more details."],"context":"limit","detail":"datasource"}
2020-09-01T17:37:22.310958+00:00 app[scheduler.3180]: 05:37:22 PM - INFO - Insert chunk of up to 100 crash records
2020-09-01T17:37:47.632298+00:00 app[scheduler.3180]: 05:37:47 PM - INFO - {"error":["You are over platform's limits: SQL query timeout error. Refactor your query before running again or contact CARTO support for more details."],"context":"limit","detail":"datasource"}
2020-09-01T17:37:47.634564+00:00 app[scheduler.3180]: 05:37:47 PM - INFO - Insert chunk of up to 100 crash records
2020-09-01T17:38:12.950562+00:00 app[scheduler.3180]: 05:38:12 PM - INFO - {"error":["You are over platform's limits: SQL query timeout error. Refactor your query before running again or contact CARTO support for more details."],"context":"limit","detail":"datasource"}

And then looking in CARTO, it appears we have no crashes since Jul 7: image

Over in Socrata, there are 16,014 crashes since then: https://data.cityofnewyork.us/Public-Safety/Cashes-Since-7-8-2020/3mj4-ptnv/edit

image

So something happened on Jul 7, and now we have an issue where we have too much backlog to easily manage...

danrademacher commented 4 years ago

Looks like dropping the slice from 100 to 50 does the trick:

09:35:03 PM - INFO - {"rows":[],"time":18.764,"fields":{},"total_rows":50}
 09:35:03 PM - INFO - Insert chunk of up to 50 crash records
 09:35:21 PM - INFO - {"rows":[],"time":18.328,"fields":{},"total_rows":50}
 09:35:21 PM - INFO - Insert chunk of up to 50 crash records
 09:35:40 PM - INFO - {"rows":[],"time":18.547,"fields":{},"total_rows":50}
 09:35:40 PM - INFO - Insert chunk of up to 50 crash records
 09:35:59 PM - INFO - {"rows":[],"time":18.263,"fields":{},"total_rows":50}
 09:35:59 PM - INFO - Insert chunk of up to 50 crash records
 09:36:18 PM - INFO - {"rows":[],"time":17.977,"fields":{},"total_rows":50}
 09:36:18 PM - INFO - Insert chunk of up to 50 crash records
 09:36:36 PM - INFO - {"rows":[],"time":18.19,"fields":{},"total_rows":50}
 09:36:36 PM - INFO - Insert chunk of up to 50 crash records
 09:36:55 PM - INFO - {"rows":[],"time":18.242,"fields":{},"total_rows":50}
 09:36:55 PM - INFO - Insert chunk of up to 50 crash records
 09:37:13 PM - INFO - {"rows":[],"time":18.012,"fields":{},"total_rows":50}
 09:37:13 PM - INFO - Insert chunk of up to 50 crash records
 09:37:32 PM - INFO - {"rows":[],"time":18.417,"fields":{},"total_rows":50}
 09:37:32 PM - INFO - Insert chunk of up to 50 crash records
 09:37:51 PM - INFO - {"rows":[],"time":18.282,"fields":{},"total_rows":50}
 09:37:51 PM - INFO - Insert chunk of up to 50 crash records
 09:38:09 PM - INFO - {"rows":[],"time":18.201,"fields":{},"total_rows":50}
 09:38:09 PM - INFO - Insert chunk of up to 50 crash records
 09:38:28 PM - INFO - {"rows":[],"time":18.287,"fields":{},"total_rows":50}

At 18 seconds per, the backlog should be cleared in ~102 minutes ((17000/50)*18)/60

danrademacher commented 4 years ago

Note that in Feb we had to cut the limit down to 100 to get this to work, https://github.com/GreenInfo-Network/nyc-crash-mapper-etl-script/issues/25, and now it's 50. Why?

danrademacher commented 4 years ago
11:17:21 PM - INFO - {"rows":[],"time":18.377,"fields":{},"total_rows":50}
 11:17:21 PM - INFO - Insert chunk of up to 50 crash records
 11:17:25 PM - INFO - {"rows":[],"time":3.073,"fields":{},"total_rows":4}
 11:17:27 PM - INFO - {"rows":[],"time":2.439,"fields":{},"total_rows":1}
 11:17:27 PM - INFO - Find SODA records updated/modified since 2020-06-03
 11:17:31 PM - INFO - Got 0 SODA entries updated since 2020-06-03
 11:17:31 PM - INFO - Done updating records
 11:17:31 PM - INFO - update_intersections() series launching
 11:17:31 PM - INFO - Intersections crashcount reset
 11:17:31 PM - INFO - Intersections crashcount dated 2018-09-02T00:00:00Z
 11:17:31 PM - INFO - CARTO Batch Job ID: 0afa4f61-c426-47fa-b388-3269131093ff
 11:17:31 PM - INFO - update_places() series launching
 11:17:31 PM - INFO - Cleanup update_borough()
 11:17:31 PM - INFO - Cleanup update_city_council()
 11:17:31 PM - INFO - Cleanup update_nypd_precinct()
 11:17:31 PM - INFO - Cleanup update_community_board()
 11:17:31 PM - INFO - Cleanup update_neighborhood()
 11:17:31 PM - INFO - Cleanup update_assembly()
 11:17:31 PM - INFO - Cleanup update_senate()
 11:17:32 PM - INFO - CARTO Batch Job ID: f4fecd9e-0d77-4eeb-b7e2-c6ba2d859a40
 11:17:32 PM - INFO - update_hasvehicle() series launching
 11:17:32 PM - INFO - CARTO Batch Job ID: ba1de757-f4c9-44bb-824f-4d280c595388
 11:17:32 PM - INFO - CARTO Batch Job ID: fe6c31da-c8da-4313-9645-3daafcbeeb7e
 11:17:32 PM - INFO - update_analyzeindex()
 11:17:33 PM - INFO - CARTO Batch Job ID: cec6ff36-6503-45ba-8c0e-aae69640cb81
 11:17:33 PM - INFO - ALL DONE

image

image