Closed mattdahl closed 3 years ago
This looks pretty good, but I worry that the data in the CL codebase isn't always totally up to date or complete. We tweak things pretty commonly in the DB without doing a data import for it. Would it be terribly hard to attempt this again using the CL API for Courts? Sorry I didn't mention this before.
Ah sure, good point. I will do that, but tomorrow!
Just pushed a new commit with some additional citation strings that were indeed available via the API but that weren't in the fixtures I was using before.
Updated (but still poorly-written) merging code, if interested: https://gist.github.com/mattdahl/885858b53640ad5c0ecff035224743ec. (Am I obtuse, or is there some way to change the pagination on the courts endpoint?)
Thanks, looks good.
No way to tweak pagination, I'm afraid. We could do it for courts, but for the other object types, it's small for performance reasons.
Pursuant to the discussion at https://github.com/freelawproject/courtlistener/issues/1521#issuecomment-753671869 et seq., this PR takes the citation_string data that is currently stored on
Court
objects in the CL's database and adds it to this centralizedcourts_db
repo.The code that I used to do the merge is here, if you want to make sure it's not insane: https://gist.github.com/mattdahl/1b786674ca7ef37a02ba748675ae2d57
The diff is pretty clean, except for two things. First, there are a couple of indentation changes, but this is because the existing indentation was off for a couple of the JSON objects (my changes standardize the indentation). Second, my changes remove a handful of
\u200e
entities that were appended to some of the court URLs. This happened automatically when I ranJSON.stringify()
, but I don't see why they would be there in the first place, so I see no harm in removing them.