freelawproject / reporters-db

A database of court reporters, tests and other experiments
BSD 2-Clause "Simplified" License
93 stars 34 forks source link

Add all reporters from Harvard's collection #8

Closed mlissner closed 3 years ago

mlissner commented 8 years ago

@harvard-lil put a big json file together with all of their reporters. It's pretty great, and contains almost all the information that we have in our reporter database. I did a very rough study of the differences:

It looks like they have about 600 reporters:

lil_db = json.load(open('/tmp/reporters.json', 'r'))
len(lil_db)

Prints:

600

Of those, I wanted to see which we already had, so:

from reporter_db import EDITIONS, REPORTERS, VARIATIONS_ONLY
all_keys = EDITIONS.keys() + REPORTERS.keys() + VARIATIONS_ONLY.keys()
found = 0
not_found = 0
for r in lil_db:
    if r['short'] in all_keys:
        found += 1
    else:
        not_found += 1
print "Found in FLP reporter DB: %s\nNot Found in FLP reporter DB: %s" % (found, not_found)

Prints:

Found in FLP reporter DB: 315
Not Found in FLP reporter DB: 285

Looking into the ones that we're missing, it's a bit of a mish-mash. There are a number of reporters where we lack the abbreviation that LIL is using. This seems to be because one or the other of us isn't using the recommended abbreviation. For these, we'll want to figure out which is correct and use that as the primary, adding the other as a variation.

The other items that are missing seem to be either small or old reporters. Here's a sampling:

Rec. T. Warwick (R.I.) Records of the Court of Trials of the Town of Warwick, 1659-1674
Rec. V.A. Ct. (R.I.) Records of the Vice-Admiralty Court of Rhode Island, 1716-1752
R.I. Ct. Rec. Rhode Island Court Records
R.I. Dec. Rhode Island Decisions
Super. Ct. (R.I.) Rhode Island Superior Court Rescripts
Rec. Co. Ch. (S.C.) Records of the court of Chancery of South Carolina, 1671-1779
Strobh. Reports of cases argued and determined in the court of appeals and the court of errors of South Carolina
Dess.Eq. Reports of cases argud and determined in the court of chancery of the state of south carolina
Rich. Eq. Cas. Reports of cases in equity, argued and determined in the court of appeals and court of errors of south carolina
Strobh. Eq. Reports of cases in equity, argued and determined in the court of appeals and court of errors of south carolina
Unrep. Tenn. Cas. Unreported Tennessee Cases
Tenn. Cas. Tennessee Cases with Notes and Annotations
Tenn. Ch. R. Tennessee Chancery Reports
Va. Ch. Dec. Decisions of the Cases in Virgin

For these, we'll probably want to simply incorporate them into our reporter database, making sure that we don't already have them under another name or abbreviation.

Doing this in reverse, to see which items we have that LIL does not, I found:

Found in LIL DB: 240
Not Found in LIL DB: 201

So the numbers are kind of consistently about 50% found/not found. I haven't dug into these because it's harder to go this direction, but I suspect it'll be similar to the above findings.

mlissner commented 3 years ago

@jcushman, I just found this early investigation into our two reporters DBs. I'm guessing this is pretty darned done. What do you think?

jcushman commented 3 years ago

Yeah I dunno. You'll see in a PR shortly that I've just been doing a little hand editing to clean up our New York City reporters for example -- there's seven reporters CAP had with made-up names like Robertsons Super. Ct. Rep. that should have been Hall, Sandf., Duer, Bosw., Rob., Sweeny, Jones & S. as nominatives of N.Y. Super. Ct., which we didn't have as a reporter at all.

So there's probably more stuff like that lingering that you could potentially find through this comparison ... though I don't know if a general issue for the whole haystack is helpful to keep open. 🤷‍♂️

jcushman commented 3 years ago

I think this is now sufficiently covered that we don't need an issue for it ...

mlissner commented 3 years ago

Woo hoo! So cool that this finally got completed. Thank you for all your work making this come together. CLOSING!