A fully-searchable and accessible archive of court data including growing repositories of opinions, oral arguments, judges, judicial financial records, and federal filings.
PR #4703 introduced stricter validation in the RECAPDocument model to ensure data integrity by preventing main PACER documents from having attachment numbers. After merging it, we started getting some validation errors, as it appears that there are existing RECAPDocument records in the database that violate these rules due to historical data inconsistencies. Specifically, we have:
Almost 3,000RECAPDocument instances where attachment_number is set (greater than 0), but document_type is marked as a main PACER document (PACER_DOCUMENT). These are probably attachments that didn't have their document_type updated.
Some RECAPDocument instances where attachment_number is 0 and document_type is also PACER_DOCUMENT. These are probably main documents that were wrongly assigned an attachment_number=0 by API users?
This means if some other task is trying to update one of these instances without updating neither the attachment number nor the document type, it will raise an error.
Before debugging any further we should probably clean up the database by fixing the inconsistent instances described above. To do this, we can update the document_type=ATTACHMENT to the ones with attachment numbers > 0:
PR #4703 introduced stricter validation in the
RECAPDocument
model to ensure data integrity by preventing main PACER documents from having attachment numbers. After merging it, we started getting some validation errors, as it appears that there are existingRECAPDocument
records in the database that violate these rules due to historical data inconsistencies. Specifically, we have:RECAPDocument
instances whereattachment_number
is set (greater than 0), butdocument_type
is marked as a main PACER document (PACER_DOCUMENT
). These are probably attachments that didn't have theirdocument_type
updated.RECAPDocument
instances whereattachment_number
is 0 anddocument_type
is alsoPACER_DOCUMENT
. These are probably main documents that were wrongly assigned anattachment_number=0
by API users?This means if some other task is trying to update one of these instances without updating neither the attachment number nor the document type, it will raise an error.
Before debugging any further we should probably clean up the database by fixing the inconsistent instances described above. To do this, we can update the
document_type=ATTACHMENT
to the ones with attachment numbers > 0:And we should change the
attachment_number=None
to the main PACER docs:--
Sentry Issue: COURTLISTENER-8MY
Filed by @elisa-a-v