JustFixNYC / nycdb-k8s-loader

Loading and updating of NYC-DB data via containerized batch processing.
6 stars 2 forks source link

Race conditions can occur when initializing DB hash table #11

Open toolness opened 5 years ago

toolness commented 5 years ago

This was introduced by #10.

I thought perhaps this might not be possible if CREATE TABLE IF NOT EXISTS were atomic, but maybe it's not? In any case, all but one of the scheduled tasks last night failed with the following traceback:

Alas, an error occurred when loading the dataset `oath_hearings`.
Traceback (most recent call last):
File "load_dataset.py", line 254, in <module>
main()
File "load_dataset.py", line 250, in main
raise e
File "load_dataset.py", line 247, in main
load_dataset(dataset)
File "load_dataset.py", line 206, in load_dataset
dbhash = SqlDbHash(conn, 'nycdb_k8s_loader.dbhash')
File "/app/dbhash.py", line 62, in __init__
self._init_db()
File "/app/dbhash.py", line 71, in _init_db
"""
File "/app/dbhash.py", line 77, in _exec_sql
cur.execute(sql, params)
psycopg2.IntegrityError: duplicate key value violates unique constraint "pg_type_typname_nsp_index"
DETAIL: Key (typname, typnamespace)=(dbhash, 4093702) already exists.

Oof.

aepyornis commented 5 years ago

I haven't been following the code, so I don't know for sure if this is related, but here are two things about psycopg2 cursors that have tripped me up in the past:

docs: http://initd.org/psycopg/docs/cursor.html

toolness commented 5 years ago

Ah, thanks! Yes, I am finding the whole cursor/connection relationship quite confusing. Will read up on this!