Closed mlissner closed 3 years ago
@jcushman, I just found this early investigation into our two reporters DBs. I'm guessing this is pretty darned done. What do you think?
Yeah I dunno. You'll see in a PR shortly that I've just been doing a little hand editing to clean up our New York City reporters for example -- there's seven reporters CAP had with made-up names like Robertsons Super. Ct. Rep.
that should have been Hall
, Sandf.
, Duer
, Bosw.
, Rob.
, Sweeny
, Jones & S.
as nominatives of N.Y. Super. Ct.
, which we didn't have as a reporter at all.
So there's probably more stuff like that lingering that you could potentially find through this comparison ... though I don't know if a general issue for the whole haystack is helpful to keep open. 🤷♂️
I think this is now sufficiently covered that we don't need an issue for it ...
Woo hoo! So cool that this finally got completed. Thank you for all your work making this come together. CLOSING!
@harvard-lil put a big json file together with all of their reporters. It's pretty great, and contains almost all the information that we have in our reporter database. I did a very rough study of the differences:
It looks like they have about 600 reporters:
Prints:
Of those, I wanted to see which we already had, so:
Prints:
Looking into the ones that we're missing, it's a bit of a mish-mash. There are a number of reporters where we lack the abbreviation that LIL is using. This seems to be because one or the other of us isn't using the recommended abbreviation. For these, we'll want to figure out which is correct and use that as the primary, adding the other as a variation.
The other items that are missing seem to be either small or old reporters. Here's a sampling:
For these, we'll probably want to simply incorporate them into our reporter database, making sure that we don't already have them under another name or abbreviation.
Doing this in reverse, to see which items we have that LIL does not, I found:
So the numbers are kind of consistently about 50% found/not found. I haven't dug into these because it's harder to go this direction, but I suspect it'll be similar to the above findings.