Open jeremy-page opened 2 years ago
The exceptions (about 10 or so) occurred on 8/1/2022. The ACTION table constraint violation appears to have happened for five action_ids 17846480 through 17846484, all at the same time, 07:11:49
I checked at the time, and there were no TASKs left in the Process table, so those failures evidently got picked up on retry and reprocessed.
Here are the five actions already in the ACTION table:
select action_id, action_name, action_params, created_at from action where action_id in (17846480, 17846481, 17846482, 17846483, 17846484);
action_id | action_name | action_params | created_at
-----------+-------------+----------------------------------------------------------------------------+-------------------------------
17846480 | process | process&PROCESS&80ea3554-17e8-411a-8898-b3e93f35ccd0&None&&hhsprotect.elr& | 2022-08-01 14:11:49.503402+00
17846481 | process | process&PROCESS&80ea3554-17e8-411a-8898-b3e93f35ccd0&None&&hhsprotect.elr& | 2022-08-01 14:11:49.815911+00
17846482 | process | process&PROCESS&80ea3554-17e8-411a-8898-b3e93f35ccd0&None&&hhsprotect.elr& | 2022-08-01 14:11:50.144032+00
17846483 | process | process&PROCESS&80ea3554-17e8-411a-8898-b3e93f35ccd0&None&&hhsprotect.elr& | 2022-08-01 14:11:50.440919+00
17846484 | process | process&PROCESS&80ea3554-17e8-411a-8898-b3e93f35ccd0&None&&hhsprotect.elr& | 2022-08-01 14:11:50.768997+00
(5 rows)
I believe this is a bug that a report is listed as being processed five times in a row!
Here's what's in the TASK and REPORT_FILE table for the above report_id:
> select * from task where report_id = '80ea3554-17e8-411a-8898-b3e93f35ccd0';
report_id | next_action | next_action_at | schema_name | receiver_name | item_count | body_format | body_url | created_at | translated_at | batched_at | sent_at | wiped_at | errored_at | retry_token | processed_at
--------------------------------------+-------------+----------------+------------------------+---------------+------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------+---------------+------------+---------+----------+------------+-------------+-------------------------------
80ea3554-17e8-411a-8898-b3e93f35ccd0 | none | | direct/abbott-covid-19 | | 1 | INTERNAL | https://pdhprodstorageaccount.blob.core.windows.net/reports/process%2Fabbott.default%2Fabbott-covid-19-80ea3554-17e8-411a-8898-b3e93f35ccd0-20220801141049.internal.csv | 2022-08-01 14:10:49.836161+00 | | | | | | | 2022-08-01 14:10:52.694834+00
(1 row)
prime_data_hub=> select * from report_file where report_id = '80ea3554-17e8-411a-8898-b3e93f35ccd0';
report_id | action_id | next_action | next_action_at | sending_org | sending_org_client | receiving_org | receiving_org_svc | transport_params | transport_result | schema_name | schema_topic | body_url | external_name | body_format | blob_digest | item_count | wiped_at | created_at | downloaded_by | item_count_before_qual_filter
--------------------------------------+-----------+-------------+----------------+-------------+--------------------+---------------+-------------------+------------------+------------------+------------------------+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+-------------+--------------------------------------------------------------------+------------+----------+-------------------------------+---------------+-------------------------------
80ea3554-17e8-411a-8898-b3e93f35ccd0 | 17845822 | process | | | | | | | | direct/abbott-covid-19 | covid-19 | https://pdhprodstorageaccount.blob.core.windows.net/reports/process%2Fabbott.default%2Fabbott-covid-19-80ea3554-17e8-411a-8898-b3e93f35ccd0-20220801141049.internal.csv | | INTERNAL | \x2a31fe07099f84ec0b309bede568a8fe453b12cdcd1a6d5c3e0e35d5b0f36e03 | 1 | | 2022-08-01 14:10:49.877883+00 | |
(1 row)
And it does appear that this one abbott record was sent to hhsprotect five times:
> select report_id, created_at, next_action, receiving_org, receiving_org_svc from report_file where report_id in (select child_report_id from report_lineage where parent_report_id = '80ea3554-17e8-411a-8898-b3e93f35ccd0');
report_id | created_at | next_action | receiving_org | receiving_org_svc
--------------------------------------+-------------------------------+-------------+---------------+-------------------
0dc66d3c-6043-4f82-bb42-33143424df90 | 2022-08-01 14:11:50.440919+00 | batch | hhsprotect | elr
74e4135b-51fd-46d2-874f-78904720319c | 2022-08-01 14:11:49.503402+00 | batch | hhsprotect | elr
5b683598-69eb-4752-91e2-1651cbf67732 | 2022-08-01 14:11:50.144032+00 | batch | hhsprotect | elr
780a92e5-61fa-40b6-84c1-2fdcb6b07ed1 | 2022-08-01 14:10:52.659177+00 | batch | hhsprotect | elr
0b7e48da-d734-4dbc-9698-b8aac39bf359 | 2022-08-01 14:11:49.815911+00 | batch | hhsprotect | elr
568bf9e0-0b2e-4b66-951d-41e2402e3cf4 | 2022-08-01 14:11:50.768997+00 | batch | hhsprotect | elr
(6 rows)
This is the first time I've seen this kind of error, so perhaps its a rare glitch in the system, not a pattern. Still it breaks our run-once semantics.
Describe the bug Exception occurring in prodction
Impact Unknown
To Reproduce Unknown
Expected behavior Not having exceptions :)
Logs