freelawproject / courts-db

A database of courts, tests and other experiments
BSD 2-Clause "Simplified" License
58 stars 15 forks source link

Import citation_string data from CourtListener #12

Closed mattdahl closed 3 years ago

mattdahl commented 3 years ago

Pursuant to the discussion at https://github.com/freelawproject/courtlistener/issues/1521#issuecomment-753671869 et seq., this PR takes the citation_string data that is currently stored on Court objects in the CL's database and adds it to this centralized courts_db repo.

The code that I used to do the merge is here, if you want to make sure it's not insane: https://gist.github.com/mattdahl/1b786674ca7ef37a02ba748675ae2d57

The diff is pretty clean, except for two things. First, there are a couple of indentation changes, but this is because the existing indentation was off for a couple of the JSON objects (my changes standardize the indentation). Second, my changes remove a handful of \u200e entities that were appended to some of the court URLs. This happened automatically when I ran JSON.stringify(), but I don't see why they would be there in the first place, so I see no harm in removing them.

mlissner commented 3 years ago

This looks pretty good, but I worry that the data in the CL codebase isn't always totally up to date or complete. We tweak things pretty commonly in the DB without doing a data import for it. Would it be terribly hard to attempt this again using the CL API for Courts? Sorry I didn't mention this before.

mattdahl commented 3 years ago

Ah sure, good point. I will do that, but tomorrow!

mattdahl commented 3 years ago

Just pushed a new commit with some additional citation strings that were indeed available via the API but that weren't in the fixtures I was using before.

Updated (but still poorly-written) merging code, if interested: https://gist.github.com/mattdahl/885858b53640ad5c0ecff035224743ec. (Am I obtuse, or is there some way to change the pagination on the courts endpoint?)

mlissner commented 3 years ago

Thanks, looks good.

No way to tweak pagination, I'm afraid. We could do it for courts, but for the other object types, it's small for performance reasons.