Closed flooie closed 3 months ago
Nice one. I'll put this onto Alberto's backlog.
Great! I have a couple of questions so far:
The current docket_number
parsing method and the way it is stored in CL won't change, correct?
I mean, currently, if we have 1:01-cv-00570-PCH
, the parsed docket_number is 1:01-cv-00570
.
So, no change here, right?
The only change is that in Juriscraper, we will now return the following fields:
{
'federal_dn_office_code': '3',
'federal_dn_case_type': 'cr',
'federal_dn_judge_initials_assigned': 'TMB',
'federal_dn_judge_initials_referred': 'MMS',
'federal_defendant_number': None
}
And these fields will be stored in the model.
02-00017-LMK
is currently not being parsed by the regex. I can tweak it to support or create a version for bankruptcy. Just to confirm, does LMK
correspond to federal_dn_judge_initials_assigned
or a different field?Regarding the bad examples:
4:20-mj-00061-N/A
<-- N/A here stands for not assigned.
In this case, should the returned federal_dn_judge_initials_assigned
be N/A
or None
?
4:20-cv-00061-CKJ-PSOT
<-- PSOT stands for Pro Se ... Tucson.
In this case, should the returned federal_dn_judge_initials_referred
be PSOT
or None
?
I mean, currently, if we have 1:01-cv-00570-PCH, the parsed docket_number is 1:01-cv-00570. So, no change here, right?
Right.
The only change is that in Juriscraper, we will now return the following fields...
Right.
does LMK correspond to federal_dn_judge_initials_assigned
I assume it's the assigned judge initials, but @flooie will know for sure.
[Should it be] N/A or None
None (or, rather, blank, ""
, right?)
In this case, should the returned federal_dn_judge_initials_referred be PSOT or None?
This is a good question. I'm inclined not to special case this. Who knows what other junk some court might put in some day. I think it's better to capture it as the referred to initials, and folks who work in that jurisdiction are probably used to this bit of confusion.
I'd want to avoid trying to identify all the possible weird ideas courts have ever or will ever come up with.
I think we should capture the two sets of initials and just store them. they are uncommon - and even thought N/A is not a set of initials having it will let us re-create the court full docket number as they represent it.
Same for the PSOT - they use the term - and we can just capture it and it's uncommon and shouldn't cause us any issue really.
Thanks! I'm starting to work on this. I'll let you know if more questions arise.
This is a follow up issue from a conversation I had with @albertisfu
We need to update juriscraper to take advantage of the new fields we are adding to the docket class. They are as follows (for now):
I've compiled an extensive list of docket number edge cases for district and a few for bankruptcy to test against and I hope I found a good regex pattern to parse out the information.
I wrote this simple function to test it.