Closed frankhereford closed 3 years ago
Sounds like a solid plan to me @frankhereford!
Thanks @mateoclarke, I'll apply the second query to staging now. Pending success and some time for anyone else to weigh in, I'll apply it to production first thing in the morning. I appreciate you giving it a once over!
Query executed on staging successfully with an execution time of ~10 seconds. It didn't update any rows, but this is expected based on the lag of crash data in that database.
Query executed in production with an execution time of 3.1 seconds. Speed up is due to running the subselect stand-alone first to let the DB get the records in question cached into memory. Thank you all for your help getting these crashes reprocessed.
@patrickm02L @mateoclarke @sergiogcx @xavierapostol
Xavier asked me to look into a crash which was missing its OCR'd narrative & the extracted diagram. I did, and I dug into what was going on with it since it has a normal, run of the mill CR3 PDF. I found that there are 851 crashes which meet these criteria:
The OCR attempt dates line up neatly with the time interval that CRIS was misbehaving in relation to Mateo's account. I peeked at the "CR3" files being downloaded during that time period, and that "text/html" mime-type was referring to the "you're not logged in" page that CRIS was returning instead of the PDF binary file.
This is the query that I have put together to investigate and isolate these crash records:
If you all agree with my assessment, I propose the following query to reset the cr3_ocr_extraction_date field to null which will allow the OCR routine to give these crashes another pass now that there is a PDF in place:
Thanks!