Cannot download 30+ page documents from PACER

freelawproject / recap

This repository is for filing issues on any RECAP-related effort.

https://free.law/recap/

12 stars 4 forks source link

Cannot download 30+ page documents from PACER #366

Open scooter1km opened 5 months ago

scooter1km commented 5 months ago

I am trying to download documents from this case (using RECAP version 2.5.0): https://www.courtlistener.com/docket/68140633/doe-v-delaware-valley-regional-high-school-board-of-education/

...specifically document 30: https://ecf.njd.uscourts.gov/doc1/119020886677?caseid=538861

When I click "Download Documents" I see the spinning cursor wheel but no actual files are downloaded.

My browser console does show an error:

My Chrome is at 'Version 121.0.6167.160 (Official Build) (arm64)'

Not sure if this is an issue with RECAP or something on my end, but I have been able to use RECAP for other documents in this case. Please let me know if there are any other details I can help provide, thanks!

mlissner commented 4 months ago

Curious, thanks for filing this. We'll put it on our backlog for when we're doing a round of robustness on the extension. If you can identify a pattern where this happens, that'd be very helpful too!

scooter1km commented 4 months ago

So I suspect this issue only occurs for some documents that are over 30 pages: show_multidocs__html.txt

This code in content_delegate.js appears to hardcode the row offset for the filename to be 4th from bottom:

      const firstTable = document.getElementsByTagName('table')[0];
      const firstTableRows = firstTable.querySelectorAll('tr');
      // 4th from bottom
      const matchedRow = firstTableRows[firstTableRows.length - 4];
      const cells = matchedRow.querySelectorAll('td');
      const document_number = cells[0].innerText.match(/\d+(?=\-)/)[0];

But for longer documents, there is an additional row at the bottom ("The document you requested is pages. You will only be billed for 30 pages.") that is not there for documents <=30 pages. My guess is that this screws up the parsing because we're trying to match "Billable Pages" instead of "Description".

mlissner commented 4 months ago

Great find! @ERosendo, I'm going to move this to your backlog to get a quick fix for this.

@scooter1km, we probably won't do a release just to fix this issue, but if you want to install from source, it's pretty easy to do. (Happy to have a PR too, if you're a developer, as you appear to be!)