freelawproject / courtlistener

A fully-searchable and accessible archive of court data including growing repositories of opinions, oral arguments, judges, judicial financial records, and federal filings.
https://www.courtlistener.com
Other
528 stars 144 forks source link

LASC PDFs need extraction (pdftotext, OCR, etc.) #1041

Open flooie opened 4 years ago

mlissner commented 4 years ago

Importantly, this is a kinda-blocker for #1012, because every 1s we're uploading another PDF that'll cost us money and time to re-download from AWS for extraction.