freelawproject / recap

This repository is for filing issues on any RECAP-related effort.
https://free.law/recap/
12 stars 4 forks source link

Unable to upload big document to RECAP Archive #235

Closed mlissner closed 4 years ago

mlissner commented 6 years ago

If you look at this link, you'll see that document 143 is not yet uploaded to the RECAP Archive:

https://www.courtlistener.com/docket/4304407/united-states-v-hill/?filed_after=&filed_before=&entry_gte=143&entry_lte=143&order_by=asc

Turns out that something about this document seems to make it impossible to upload. It's about 25MB in size, which I suspect is the heart of the problem.

I tried digging in the Firefox and Chrome debuggers for this for something like 30 minutes (my internet is pretty slow), but I wasn't able to get breakpoints to work — ones that should have been triggered just...weren't, and the upload didn't proceed. The document did eventually download though, and I didn't get any errors.

A couple other observations:

Hm. It's a tough one so far. Debugging is expensive and slow due to the document being so big (it's 86 scanned pages).

mlissner commented 6 years ago

Another, 36MB: gov.uscourts.ncmd.64541.150.0.pdf

mlissner commented 6 years ago

Here's another a user is complaining about: https://ecf.ned.uscourts.gov/doc1/11303699099?caseid=75152

Bummer. Would be great to get a fix here. I wonder if @pascal666 would be interested/able.

jraller commented 5 years ago

Filesaver.js recomments upgrading to https://github.com/jimmywarting/StreamSaver.js to address this issue.

mlissner commented 5 years ago

Got a link for that recommendation?

On Sat, Mar 16, 2019, 9:55 PM Jason Aller notifications@github.com wrote:

Filesaver.js recomments upgrading to https://github.com/jimmywarting/StreamSaver.js to address this issue.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/freelawproject/recap/issues/235#issuecomment-473616511, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOdqiUS4IbbBC8UuC7Tv5vccnIsQlx8ks5vXcrHgaJpZM4RvBdV .

jraller commented 5 years ago

See the first paragraph of the README.md file: https://github.com/eligrey/FileSaver.js/

mlissner commented 5 years ago

Interesting! Looks promising.

On Sat, Mar 16, 2019, 9:59 PM Jason Aller notifications@github.com wrote:

See the first paragraph of the README.md file: https://github.com/eligrey/FileSaver.js/

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/freelawproject/recap/issues/235#issuecomment-473616676, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOdqmDKPj916UcnXrm0OhU3PwJP4YBBks5vXcu4gaJpZM4RvBdV .

jraller commented 5 years ago

StreamSaver.js proved to be problematic for several reasons. I tested the polyfill they suggested, but the getWriter() function still did not work for the headless Chrome used by the CI build system. In digging through the code I came across their mitm implementatation that would have proved to be a deal breaker for inclusion in an extension.

mlissner commented 4 years ago

For future tests, these are the biggest docs in the RECAP Archive we don't have. They're about 100-60MB:

https://www.courtlistener.com/docket/8338289/32/1/demaria-v-city-of-bellevue/
https://www.courtlistener.com/docket/14582054/1/4/capitol-indemnity-corporation-v-reflections-academy-inc/
https://www.courtlistener.com/docket/5963064/13/5/mckenzie-v-at-t-services-inc/
https://www.courtlistener.com/docket/5322654/26/1/migis-v-autozone-inc/
https://www.courtlistener.com/docket/6120786/38/10/stenzel-v-metropolitan-life-insurance-company/
https://www.courtlistener.com/docket/8333120/31/2/oropeza-v-bnsf-railway-company/
https://www.courtlistener.com/docket/5246456/23/16/schneider-v-gp-strategies-corporation/
https://www.courtlistener.com/docket/16381887/1/1/does-1-through-10-v-haaland/
https://www.courtlistener.com/docket/4199809/142/3/montague-v-yale-university/
https://www.courtlistener.com/docket/6440733/171/1/united-states-v-kelsey/
https://www.courtlistener.com/docket/8338324/1/4/dunn-v-fca-us-llc/
https://www.courtlistener.com/docket/8338325/1/5/cole-v-fca-us-llc/
https://www.courtlistener.com/docket/6973892/39/15/kulakowski-v-westrock-services-inc/
https://www.courtlistener.com/docket/16592470/1/8/msp-recovery-claims-series-llc-v-actavis-elizabeth-llc/
https://www.courtlistener.com/docket/8333208/18/2/bertacchi-freeman-v-hartford-life-and-accident-insurance-company/
https://www.courtlistener.com/docket/5706581/35/13/fiely-v-essex-healthcare-corporation/
https://www.courtlistener.com/docket/5158705/145/7/spears-v-liberty-life-assurance-company-of-boston/
https://www.courtlistener.com/docket/5158705/142/7/spears-v-liberty-life-assurance-company-of-boston/
https://www.courtlistener.com/docket/16592470/1/6/msp-recovery-claims-series-llc-v-actavis-elizabeth-llc/
https://www.courtlistener.com/docket/8272685/31/1/leighter-v-fedex-ground-package-system-inc/

List generated with:

rds = RECAPDocument.objects.filter(is_available=False).exclude(file_size=None).order_by('-file_size')
mlissner commented 4 years ago

This has continued coming and going and being meddlesome.

mlissner commented 4 years ago

Fixed in: https://github.com/freelawproject/recap-chrome/pull/113