Brief Description
If a PSQL connection goes away, then nothing inside loops tries to recreate it.
We can see a situation where a stage loops through all documents, and every single one crashes with an error like:
Traceback (most recent call last):
File "/code/library/validate.py", line 66, in process_hash_list
db.updateValidationRequestDate(conn, file_id)
File "/code/library/db.py", line 470, in updateValidationRequestDate
cur.execute(sql, data)
psycopg2.OperationalError: SSL SYSCALL error: EOF detected
Severity
Low (as seems to happen very rarely)
Problem
Say the validate stage starts.
There are 1000 docs to validate.
In validate.process_hash_list it gets the connection at the start.
It starts the first document, somehow the connection crashes.
But nowhere inside the for file_data in document_datasets: loop does it try to remake the connection!
so for each of the next 999 documents it will loop thru them, call updateValidationRequestDate and crash and mark them as error.
This probably applies to other stages than validate too
Brief Description If a PSQL connection goes away, then nothing inside loops tries to recreate it.
We can see a situation where a stage loops through all documents, and every single one crashes with an error like:
Severity Low (as seems to happen very rarely)
Problem
Say the validate stage starts. There are 1000 docs to validate. In validate.process_hash_list it gets the connection at the start. It starts the first document, somehow the connection crashes. But nowhere inside the
for file_data in document_datasets:
loop does it try to remake the connection! so for each of the next 999 documents it will loop thru them, call updateValidationRequestDate and crash and mark them as error.This probably applies to other stages than validate too