[WIP] #197 SCOTUS scraper - Githubissues

freelawproject / juriscraper

An API to scrape American court websites for metadata.

https://free.law/juriscraper/

BSD 2-Clause "Simplified" License

341 stars 98 forks source link

[WIP] #197 SCOTUS scraper #975

Open ralexx opened 3 months ago

ralexx commented 3 months ago

From #197 .

Notable

Adds pymupdf as a dependency, as suggested.
Includes log-then-raise handling for two characteristic host responses ('Access Denied' page and server name resolution error) that appear to be anti-abuse measures.
Because of the above, only single-threaded downloading was implemented here.

TODO

[ ] Docket parser mapping to database structure: maintainer input needed
[ ] Python 3.8 support if necessary (I used some 3.9+ syntax)
[ ] Tests for docket parser
[ ] Tests currently omitted because they would need additional mocking
[ ] Better handling of the two host responses mentioned above
[ ] Multithreaded downloading, if desired (implemented locally but not included in this draft)

CLAassistant commented 3 months ago

All committers have signed the CLA.