GSA / catalog.data.gov

Development environment for catalog.data.gov
https://catalog.data.gov
50 stars 15 forks source link

Automated CKAN Job Error Condition - Stuck Jobs #876

Closed nickumia-reisys closed 1 year ago

nickumia-reisys commented 1 year ago

Workflow with Issue: 4 - Automated CKAN Jobs Job Failed: ckan-auto-command CKAN Command (in question): ckan geodatagov check-stuck-jobs CKAN Command Schedule: 30 6 * Cloud.gov Environment: prod

Last Commit: 081e30756c70c043d8aef08981db9e79b31f87b1 Number of times run: 1 Last run by: btylerburton Github Action Run: https://github.com/GSA/catalog.data.gov/actions/runs/4465397027

nickumia-reisys commented 1 year ago

This is not an important issue. But as a forewarning @btylerburton, be mindful of edits to this type of issue. More than one (different type of) automated jobs may fail and both will "update" the same ticket. If these are triaged immediately, it's not a problem. But if it's long enough, just make sure all of the edits are referring to the same command. If there's an older edit that needs attention, just make a "new" issue with a different name to track it.

For this one in particular, I think it'll stop erroring in 72 hours from the first time it happened because the "stuck jobs" are "force finished" in 72 hours. So you can,

nickumia-reisys commented 1 year ago

Split into a new issue:

For the stuck jobs, there was originally the one stuck job in staging on 3/17 (which was a no-op and able to resolve itself):

source_id: 58f92550-7a01-4f00-b1b2-8dc953bd598f | created_time: 2023-03-14 19:54:18.552932 | current_time: 2023-03-17 06:39:45.810617+00:00 | gather_started: None | gather_finished: None | running_length: 2 days, 10:45:27.257685 | source_title: NASA Data.json | organization: National Aeronautics and Space Administration

There were 31 'stuck' jobs on 3/18:

source_id: c084a438-6f6b-470d-93e0-16aeddb9f513 | created_time: 2023-03-17 05:28:03.991776 | current_time: 2023-03-18 06:40:34.623618+00:00 | gather_started: 2023-03-17 05:40:07.913481 | gather_finished: 2023-03-17 05:43:45.995993 | running_length: 1 day, 1:12:30.631842 | source_title: NOAA/NESDIS/ncei/accessions | organization: National Oceanic and Atmospheric Administration, Department of Commerce
source_id: 4ce10abe-91d2-46e3-82b9-2365185e5c46 | created_time: 2023-03-17 05:28:18.687371 | current_time: 2023-03-18 06:40:34.623618+00:00 | gather_started: 2023-03-17 06:07:07.582267 | gather_finished: 2023-03-17 06:07:17.080197 | running_length: 1 day, 1:12:15.936247 | source_title: FEMA-R03 | organization: Federal Emergency Management Agency, Department of Homeland Security
source_id: dc5d3825-e646-4f6c-b4c8-77fec312cf16 | created_time: 2023-03-17 05:28:18.485802 | current_time: 2023-03-18 06:40:34.623618+00:00 | gather_started: 2023-03-17 06:06:22.507953 | gather_finished: 2023-03-17 06:07:06.237649 | running_length: 1 day, 1:12:16.137816 | source_title: FEMA-R04 | organization: Federal Emergency Management Agency, Department of Homeland Security
source_id: 06aebe17-1552-4d85-aec4-b9c4b717ffa0 | created_time: 2023-03-17 05:28:19.188090 | current_time: 2023-03-18 06:40:34.623618+00:00 | gather_started: 2023-03-17 06:07:23.551825 | gather_finished: 2023-03-17 06:07:23.972992 | running_length: 1 day, 1:12:15.435528 | source_title: fema-mip-nfhl | organization: Federal Emergency Management Agency, Department of Homeland Security
source_id: cc78f881-4217-466f-b74a-b18e0644e766 | created_time: 2023-03-17 05:28:19.022864 | current_time: 2023-03-18 06:40:34.623618+00:00 | gather_started: 2023-03-17 06:07:20.465109 | gather_finished: 2023-03-17 06:07:23.410808 | running_length: 1 day, 1:12:15.600754 | source_title: FEMA-R01 | organization: Federal Emergency Management Agency, Department of Homeland Security
source_id: a49a5edc-d60e-48eb-a26f-3b29d5886786 | created_time: 2023-03-17 05:28:19.699843 | current_time: 2023-03-18 06:40:34.623618+00:00 | gather_started: 2023-03-17 06:07:47.306007 | gather_finished: 2023-03-17 06:07:48.950437 | running_length: 1 day, 1:12:14.923775 | source_title: Hartford Data.json | organization: City of Hartford
source_id: bac1f9bf-daa1-4a30-b5ac-b04d3095c278 | created_time: 2023-03-17 05:28:19.531835 | current_time: 2023-03-18 06:40:34.623618+00:00 | gather_started: 2023-03-17 06:07:44.929227 | gather_finished: 2023-03-17 06:07:47.255875 | running_length: 1 day, 1:12:15.091783 | source_title: EnergyStar | organization: U.S. Environmental Protection Agency
source_id: 527363ff-de32-4b43-a73f-ed38fccdb66d | created_time: 2023-03-17 05:28:20.271532 | current_time: 2023-03-18 06:40:34.623618+00:00 | gather_started: 2023-03-17 06:08:26.846950 | gather_finished: 2023-03-17 06:08:33.781322 | running_length: 1 day, 1:12:14.352086 | source_title: City of Baton Rouge Data.json | organization: City of Baton Rouge
source_id: 36c82f29-4f54-495e-a878-2c07320bf10c | created_time: 2023-03-17 05:28:19.868184 | current_time: 2023-03-18 06:40:34.623618+00:00 | gather_started: 2023-03-17 06:07:48.980359 | gather_finished: 2023-03-17 06:08:26.266824 | running_length: 1 day, 1:12:14.755434 | source_title: Connecticut Data.json | organization: State of Connecticut
source_id: 9b3cd81e-5515-4bb7-ad3c-5ae44de9b4bd | created_time: 2023-03-17 05:28:21.334871 | current_time: 2023-03-18 06:40:34.623618+00:00 | gather_started: 2023-03-17 06:09:08.795774 | gather_finished: 2023-03-17 06:09:16.224390 | running_length: 1 day, 1:12:13.288747 | source_title: Environmental Dataset Gateway ISO Geospatial Metadata | organization: U.S. Environmental Protection Agency
source_id: d747cd8f-d1d6-49a3-ab0b-ea17684f1121 | created_time: 2023-03-17 05:28:21.909564 | current_time: 2023-03-18 06:40:34.623618+00:00 | gather_started: 2023-03-17 06:09:46.302828 | gather_finished: 2023-03-17 06:09:56.727244 | running_length: 1 day, 1:12:12.714054 | source_title: oregon json | organization: State of Oregon
source_id: ec97977e-585c-416a-b4b3-7d656d36891a | created_time: 2023-03-17 05:28:21.081778 | current_time: 2023-03-18 06:40:34.623618+00:00 | gather_started: 2023-03-17 06:08:48.388874 | gather_finished: 2023-03-17 06:09:08.120117 | running_length: 1 day, 1:12:13.541840 | source_title: Environmental Dataset Gateway FGDC CSDGM | organization: U.S. Environmental Protection Agency
source_id: 1c33d1cb-2f69-4f1b-835e-453790f38dc7 | created_time: 2023-03-17 05:28:21.621775 | current_time: 2023-03-18 06:40:34.623618+00:00 | gather_started: 2023-03-17 06:09:16.473355 | gather_finished: 2023-03-17 06:09:46.011046 | running_length: 1 day, 1:12:13.001843 | source_title: WA JSON | organization: State of Washington
source_id: ba08fbb6-207c-4622-87d1-a82b7bd693ce | created_time: 2023-03-17 05:28:22.129459 | current_time: 2023-03-18 06:40:34.623618+00:00 | gather_started: 2023-03-17 06:09:56.809956 | gather_finished: 2023-03-17 06:10:02.096493 | running_length: 1 day, 1:12:12.494159 | source_title: OK JSON | organization: State of Oklahoma
source_id: 74e5aca6-5900-4fd4-9645-4c9648709c14 | created_time: 2023-03-17 05:28:22.323568 | current_time: 2023-03-18 06:40:34.623618+00:00 | gather_started: 2023-03-17 06:10:02.157762 | gather_finished: 2023-03-17 06:10:06.692476 | running_length: 1 day, 1:12:12.300050 | source_title: MO JSON | organization: State of Missouri
source_id: d52781fd-51ef-4061-9cd2-535a0afb663b | created_time: 2023-03-17 05:28:22.720999 | current_time: 2023-03-18 06:40:34.623618+00:00 | gather_started: 2023-03-17 06:10:29.583908 | gather_finished: 2023-03-17 06:10:39.554008 | running_length: 1 day, 1:12:11.902619 | source_title: cookcountyil json | organization: Cook County of Illinois
source_id: 8d269e22-ff3d-45d8-b878-47ef2aba7851 | created_time: 2023-03-17 05:28:22.933169 | current_time: 2023-03-18 06:40:34.623618+00:00 | gather_started: 2023-03-17 06:10:39.626729 | gather_finished: 2023-03-17 06:10:47.182089 | running_length: 1 day, 1:12:11.690449 | source_title: montgomerycountymd json | organization: Montgomery County of Maryland
source_id: ded7e0b2-febc-49bb-af4c-ee572aa34770 | created_time: 2023-03-17 05:28:23.324192 | current_time: 2023-03-18 06:40:34.623618+00:00 | gather_started: 2023-03-17 06:10:52.883397 | gather_finished: 2023-03-17 06:10:54.314911 | running_length: 1 day, 1:12:11.299426 | source_title: somervillema json | organization: City of Somerville
source_id: a91026a8-af79-4f8c-bcfe-e14f3d6aa4fb | created_time: 2023-03-17 05:28:23.146162 | current_time: 2023-03-18 06:40:34.623618+00:00 | gather_started: 2023-03-17 06:10:47.265432 | gather_finished: 2023-03-17 06:10:52.839171 | running_length: 1 day, 1:12:11.477456 | source_title: kingcounty json | organization: King County, Washington
source_id: f61de7a2-69cf-40a9-a1bc-dd9edb8f3fe5 | created_time: 2023-03-17 05:28:23.542601 | current_time: 2023-03-18 06:40:34.623618+00:00 | gather_started: 2023-03-17 06:10:54.344231 | gather_finished: 2023-03-17 06:11:20.579909 | running_length: 1 day, 1:12:11.081017 | source_title: Seattle JSON | organization: City of Seattle
source_id: 1696593e-c691-4f61-a696-5dcb9e4c9b4c | created_time: 2023-03-17 05:28:23.951826 | current_time: 2023-03-18 06:40:34.623618+00:00 | gather_started: 2023-03-17 06:11:26.363729 | gather_finished: 2023-03-17 06:12:46.091571 | running_length: 1 day, 1:12:10.671792 | source_title: NYC JSON | organization: City of New York
source_id: c5ee7104-80bc-4f22-8895-6c2c3755af40 | created_time: 2023-03-17 05:28:23.742823 | current_time: 2023-03-18 06:40:34.623618+00:00 | gather_started: 2023-03-17 06:11:20.754322 | gather_finished: 2023-03-17 06:11:26.221366 | running_length: 1 day, 1:12:10.880795 | source_title: honolulu json | organization: City of Honolulu
source_id: 8507fa43-f429-4095-b732-2177330ce485 | created_time: 2023-03-17 05:28:24.749945 | current_time: 2023-03-18 06:40:34.623618+00:00 | gather_started: 2023-03-17 06:13:03.037636 | gather_finished: 2023-03-17 06:13:17.551345 | running_length: 1 day, 1:12:09.873673 | source_title: SFO JSON | organization: City of San Francisco
source_id: 8f77b6d5-f630-4995-bdf3-0aee7158a7f3 | created_time: 2023-03-17 05:28:25.687905 | current_time: 2023-03-18 06:40:34.623618+00:00 | gather_started: 2023-03-17 06:13:19.649091 | gather_finished: 2023-03-17 06:13:32.009532 | running_length: 1 day, 1:12:08.935713 | source_title: Alaska Division of Geological and Geophysical Surveys | organization: State of Alaska
source_id: f35df04a-a619-4f92-bf5c-b9915b083bb1 | created_time: 2023-03-17 05:28:25.871030 | current_time: 2023-03-18 06:40:34.623618+00:00 | gather_started: 2023-03-17 06:13:32.427729 | gather_finished: 2023-03-17 06:13:37.329604 | running_length: 1 day, 1:12:08.752588 | source_title: Alaska Department of Natural Resources, IRM | organization: State of Alaska
source_id: bee011ae-6f4a-4b9b-bef9-6c8b6453c562 | created_time: 2023-03-17 05:27:52.546695 | current_time: 2023-03-18 06:40:34.623618+00:00 | gather_started: 2023-03-17 05:28:09.689448 | gather_finished: 2023-03-17 05:34:35.266052 | running_length: 1 day, 1:12:42.076923 | source_title: DOI DCAT-US harvest source | organization: Department of the Interior
source_id: f7600d52-3f59-46ce-bb72-d1da3f45739a | created_time: 2023-03-17 05:28:18.859830 | current_time: 2023-03-18 06:40:34.623618+00:00 | gather_started: 2023-03-17 06:07:17.416576 | gather_finished: 2023-03-17 06:07:20.357233 | running_length: 1 day, 1:12:15.763788 | source_title: FEMA-R02 | organization: Federal Emergency Management Agency, Department of Homeland Security
source_id: 4678ef15-545c-4c3b-be19-75d55f823605 | created_time: 2023-03-17 05:28:20.461978 | current_time: 2023-03-18 06:40:34.623618+00:00 | gather_started: 2023-03-17 06:08:33.823274 | gather_finished: 2023-03-17 06:08:42.975733 | running_length: 1 day, 1:12:14.161640 | source_title: Los Angeles data.json | organization: City of Los Angeles
source_id: 2757370c-3126-4db9-9f4e-f9f874ae3392 | created_time: 2023-03-17 05:28:19.345110 | current_time: 2023-03-18 06:40:34.623618+00:00 | gather_started: 2023-03-17 06:07:24.000959 | gather_finished: 2023-03-17 06:07:44.281195 | running_length: 1 day, 1:12:15.278508 | source_title: FEMA-R05 | organization: Federal Emergency Management Agency, Department of Homeland Security
source_id: ee165ef5-7b35-41a7-b3a5-5479c71b0e58 | created_time: 2023-03-17 05:28:22.511677 | current_time: 2023-03-18 06:40:34.623618+00:00 | gather_started: 2023-03-17 06:10:06.759415 | gather_finished: 2023-03-17 06:10:29.434359 | running_length: 1 day, 1:12:12.111941 | source_title: md json | organization: State of Maryland
source_id: 7590e386-229e-453a-8e53-6f18e200e421 | created_time: 2023-03-17 05:28:24.321171 | current_time: 2023-03-18 06:40:34.623618+00:00 | gather_started: 2023-03-17 06:12:47.111258 | gather_finished: 2023-03-17 06:13:02.534955 | running_length: 1 day, 1:12:10.302447 | source_title: Chicago JSON | organization: City of Chicago

And since then there have only been 2 'stuck' jobs on 3/19 and 3/20:

source_id: c084a438-6f6b-470d-93e0-16aeddb9f513 | created_time: 2023-03-17 05:28:03.991776 | current_time: 2023-03-19 06:40:47.085772+00:00 | gather_started: 2023-03-17 05:40:07.913481 | gather_finished: 2023-03-17 05:43:45.995993 | running_length: 2 days, 1:12:43.093996 | source_title: NOAA/NESDIS/ncei/accessions | organization: National Oceanic and Atmospheric Administration, Department of Commerce
source_id: bee011ae-6f4a-4b9b-bef9-6c8b6453c562 | created_time: 2023-03-17 05:27:52.546695 | current_time: 2023-03-19 06:40:47.085772+00:00 | gather_started: 2023-03-17 05:28:09.689448 | gather_finished: 2023-03-17 05:34:35.266052 | running_length: 2 days, 1:12:54.539077 | source_title: DOI DCAT-US harvest source | organization: Department of the Interior

I think it all needs to be explained...

nickumia-reisys commented 1 year ago

Yesterday @FuhuXia explained that the 31 jobs got delayed because NOAA is so large and hogged a lot of resources on Saturday. After NOAA got through, the other jobs were able to execute and complete.

NOAA and DOI do have issues being so large and having so many trivial updates that make harvesting non-trivial. They have since been completed (or forcibly completed) so the issue can be resolved. But it's also just good O&M to know that these harvest sources cause recurring monthly issues.