Open sentry-io[bot] opened 10 months ago
The document in question was parsed successfully in another processing queue
From the Sentry issue, the failing document is from court ca4
and has pacer_doc_id '00409653890'
On courtlistener API, we get 3 matches on RecapProcessingQueue for this pacer_doc_id and court, where 2 have processed succesfully and one has failed. The failing queue's id matches the one reported on Sentry <ProcessingQueue: 12151650>
We also have their corresponding Email Processing Queues, all with the same message_id mj7ca7f39vp36ldrtnlv6qjp1nm5rg5emj1l0rg1
, all created within less than 3 minutes of each other
From all these queues, one document has been processed successfully (CL, API), which matches the document linked in the original email
It has the document_number field that the Sentry issue reports we failed to get on the get_document_number_from_confirmation_page
call
I think the real question is why we got 3 queues for the same original message_id
Is it possible that we got this three times because three accounts are subscribed? I.e., it's the same message but sent to three different @recap.email addresses?
I think that's not the case, since all 3 queues have the same message_id
The Email Processing Queues all have the same "destination_emails" which is a single @recap.email address.
In fact, the only significant difference is the date_created
field . I cannot see the uploader
field since the API doesn't show it and I do not have access to the production DB
On the email file itself, which is the same for all 3 queues, there is only one @recap.email address (grep recap mj7ca7f39vp36ldrtnlv6qjp1nm5rg5emj1l0rg1
), which is the same as the one on destination_emails
above. I have pasted the email contents as text on the Sentry issue if you want to take a look
It's possible the lambda that hits the API did retries, but I can't imagine why it would.
But the underlying problem seems to be that the magic link is used when we call get_document_number_from_confirmation_page
. Is that right?
I think the magic number is used before on download_pacer_pdf_and_save_to_pq.
The confirmation page download happens in juriscraper and there is no mention of magic numbers
Anyway, it shouldn't get to the get_document_number_from_confirmation_page
call, since it tries to get the document number from doctor's /utils/document-number/pdf/
endpoint before, and that actually works for this case
Hm, could it be that celery did things out of order?
@grossir, @flooie: We'll want to prioritize this one and get it analyzed as quickly as possible.
Sentry Issue: COURTLISTENER-635