freelawproject / courtlistener

A fully-searchable and accessible archive of court data including growing repositories of opinions, oral arguments, judges, judicial financial records, and federal filings.
https://www.courtlistener.com
Other
550 stars 151 forks source link

attachment_number must be null for a main PACER document. #4706

Open sentry-io[bot] opened 2 days ago

sentry-io[bot] commented 2 days ago

PR #4703 introduced stricter validation in the RECAPDocument model to ensure data integrity by preventing main PACER documents from having attachment numbers. After merging it, we started getting some validation errors, as it appears that there are existing RECAPDocument records in the database that violate these rules due to historical data inconsistencies. Specifically, we have:

This means if some other task is trying to update one of these instances without updating neither the attachment number nor the document type, it will raise an error.

Before debugging any further we should probably clean up the database by fixing the inconsistent instances described above. To do this, we can update the document_type=ATTACHMENT to the ones with attachment numbers > 0:

RECAPDocument.objects.filter(attachment_number__gt=0, document_type=RECAPDocument.PACER_DOCUMENT).update(document_type=RECAPDocument.ATTACHMENT)

And we should change the attachment_number=None to the main PACER docs:

RECAPDocument.objects.filter(attachment_number=0, document_type=RECAPDocument.PACER_DOCUMENT).update(attachment_number=None)

--

Sentry Issue: COURTLISTENER-8MY

Filed by @elisa-a-v