Upload miscellaneous_court_opinions from columbia corpus

elliottash commented 8 years ago

There are two folders with a total of ~100 opinions (not that many) called 'miscellaneous_court_opinions', one for New York and one for Ohio:

Failed to find a court ID for "Supreme Court, New York County.".
Failed to find a court ID for "Supreme Court, New York Special Term".
Failed to find a court ID for "City Court of New York, Bronx County.".
Failed to find a court ID for "Supreme Court, Special Term, Queens County.".
Failed to find a court ID for "State of New York, Court of Claims.".
Failed to find a court ID for "County Court, Rockland County.".
Failed to find a court ID for "New York Common Pleas &#8212; General Term".
Failed to find a court ID for "Supreme Court, Westchester County.".
Failed to find a court ID for "Supreme Court, Special Term, Queens County.".
Failed to find a court ID for "Supreme Court, Appellate Term".
Failed to find a court ID for "Supreme Court, Erie Equity Term.".

We need to add these courts to the DB and import these cases.-

mlissner commented 8 years ago

There are two competing theories on how to do this:

Put all of these miscellaneous courts into a single "meta court" called, for example, "nylowercourts".
Do the super-granular thing and make an entry in the database for each and every one of these courts.

Pros of Option 1

Simple.

The UI just gets one more check box. The database just gets one new entry. Done in a day or so.
This is the granularity people tend to care about. Very few people will only be looking for cases in "County Court, Rockland County." Much better to just look in all of NY's lower courts.

Cons of 1

API and data lack granularity. If we start having a court called nylowercourts, we can never get rid of it, even though it doesn't really exist. People will be counting on it.
A FLP strength is data granularity. I'd rather say we have 400 courts than say we have 388 (or whatever this comes out to).
The future could suck. Say we later want to do something granular with one of these courts. Well...that day could come if we add docketing for NY, say, or something like that. If it comes, we'll have to split up this data and do the hard thing, except it will be harder than it would have been the first time.

Pro of Option 2

We can provide use useful UI, displaying NY lower courts as a single option while also providing highly granular API and data.
If there's a chance we have to take option two down the road, it's better to do it sooner rather than later.

Con of Option 2

It's hard. We'll need to gather data and input all of the dinky courts, something we haven't yet done in any major way.
We'll have to do the work of making the UI a decent experience.

Conclusion

Typing this up, I lean towards option 2. It's harder now, but more future proof. It also creates better data.

mlissner commented 7 months ago

@flooie any idea where this stands?

freelawproject / courtlistener