Open antidipyramid opened 1 month ago
Deleting the offending bills and re-scraping fixed the issue.
This has come up again with another restricted bill with an identifier of 2024-1033
:
raise DataImportError(
pupa.exceptions.DataImportError: duplicate key value violates unique constraint "councilmatic_core_bill_slug_ecb9ca6b_uniq"
DETAIL: Key (slug)=(2024-1033) already exists.
while importing {'identifier': '2024-1033', 'title': 'Restricted View', 'classification': ['bill'], 'subject': [], 'extras': {'restrict_view': True, 'plain_text': '', 'rtf_text': ''}, 'legislative_session_id': UUID('d5353c5e-efed-43b7-9c08-54751ed323a8'), 'from_organization_id': 'ocd-organization/f659e65f-0e12-46f2-9610-c3f1456540a2'} as <class 'opencivicdata.legislative.models.bill.Bill'>; 2002774)
There's already a bill with the same identifier in the database. One difference between the it and the scraped bill seems to be the legislative_session_id
-- the one in the database has a legislative_session_id
of UUID('997eda68-3c01-4378-adeb-2a009842a7b4')
.
The from_organization_id
s are identical. Pupa uses these two attributes along with the bill's identifier to check if it needs to create a new object or update an existing one.
Since Pupa thinks it's scraping a new bill, it tries to create it but the identifier/slug clashes with the existing bill's, raising the import error.
To your knowledge, has this come up in the past, @hancush? It seems like the common thread is that all of these bills were at one time restricted.
@antidipyramid The conflict in legislative session is definitely to blame here. Does how we determine legislative session vary between private and public bills?
It looks like we pass the matter's intro date to self.session – can that change? https://github.com/Metro-Records/scrapers-lametro/blob/b44bebba1ee10303769493fbd19dfb543f2cbbc4/lametro/bills.py#L210
@hancush One way of dealing with this is to simply remove the legislative session from object spec during import
Bill
slugs (i.e. identifiers) already must be unique so this would allow the importer to match to the existing object and update its session.
Could work, @antidipyramid! I do want to understand why this is happening now, though.
@xmedr, if you have a chance in the next two weeks, it might be a good idea to take a look at the most recent scraper updates to see if those have anything to do with this behavior re: restricted bills. I don't see an obvious link but it'd be nice to get a second opinion.
I searched for similar errors in past issues across multiple repos and didn't see anything that looked like this.
Board reports 2024-0556 and 2024-0549 were both restricted bills that raised DataImportErrors during scraping (both when restricted and not).