freelawproject / courtlistener

A fully-searchable and accessible archive of court data including growing repositories of opinions, oral arguments, judges, judicial financial records, and federal filings.
https://www.courtlistener.com
Other
542 stars 149 forks source link

DatabaseError in extract_recap_pdf: Save with update_fields did not affect any rows #4055

Open sentry-io[bot] opened 5 months ago

sentry-io[bot] commented 5 months ago

Sentry Issue: COURTLISTENER-79X

DatabaseError: Save with update_fields did not affect any rows.
(7 additional frame(s) were not displayed)
...
  File "cl/scrapers/tasks.py", line 253, in extract_recap_pdf
    return async_to_sync(extract_recap_pdf_base)(
  File "cl/scrapers/tasks.py", line 319, in extract_recap_pdf_base
    await rd.asave(
  File "cl/search/models.py", line 1553, in asave
    return await sync_to_async(self.save)(
  File "cl/search/models.py", line 1529, in save
    super().save(update_fields=update_fields, *args, **kwargs)

Filed by: @albertisfu

mlissner commented 5 months ago

I've never seen this error before. Any idea if it matters or what causes it? Looks like we've had four instances of this error so far. :thinking:

albertisfu commented 5 months ago

This is a pretty weird one. After reviewing the events in detail, I found that the related RDs were deleted from the DB:

399406274
399406273
399406272
399406225

So that's the reason why the DatabaseError was triggered. Probably a race condition occurred between the document being extracted and the duplicated documents being cleaned up, maybe within the custom RDs save method.

I was able to trace the document by the docket number and the extracted content shown in Sentry, and all the removed documents belong to the following entry:

https://www.courtlistener.com/docket/32210448/united-states-v-al-arian/?page=5#entry-1543

However, it is strange that there is only one attachment, meaning that the removed documents were not duplicated. Something went wrong during the process.