freelawproject / courtlistener

A fully-searchable and accessible archive of court data including growing repositories of opinions, oral arguments, judges, judicial financial records, and federal filings.
https://www.courtlistener.com
Other
500 stars 138 forks source link

cl_scrape_opinions: Some scraped fields are not used when creating objects #4042

Open grossir opened 1 month ago

grossir commented 1 month ago

The following returned attributes from juriscraper scrapers are not used on courtlistener:

These require easy changes (1-2 lines) to be used in the models:

These require more work:

For these, I find no model on CL, nor a direct reference when using string search

If not used anywhere, we should probably delete them, since they are introducing noise


Code on courtlistener that uses the scraped attributes to build objects: https://github.com/freelawproject/courtlistener/blob/c8f712754ff7041235df617e6351accf4b6b3754/cl/scrapers/management/commands/cl_scrape_opinions.py#L78C1-L137C6

Code on juriscraper that defines the attributes is on OpinionSite and OpinionSiteLinear https://github.com/grossir/juriscraper/blob/92d27210adebfe7efa3b5ff2777667d3cd0de78f/juriscraper/OpinionSite.py#L18-L43

grossir commented 1 month ago

This is a good opportunity to support some extra fields both on Courtlistener and Juriscraper I propose to add these: