Open grossir opened 1 month ago
So:
docket_number_core
for state court docket numbers we don't understand very well.I downloaded the details for 1236 events / logs from Sentry, which map to 486 unique dockets. Then, I manually inspected each court group, and found that some are indeed mixing up dockets, and some others are matching the correct docket, but bring updated values which may not be better than the old values
The fladistctapp
and az
errors are due to an error when getting the docket_number_core
, which ignores the parts of the docket that signal differences between districts or process types
In the case of ohio, the docket number is exactly the same, the scraper should be fixed to return a more detailed docket number
Court domain | Dockets with error logs | Reason | Example |
---|---|---|---|
1dca.flcourts.gov | 67 | Missmatch across districts | '5D2023-0888' and '2D2023-0888' |
4dca.flcourts.gov | 44 | ||
5dca.flcourts.gov | 37 | ||
2dca.flcourts.gov | 36 | ||
6dca.flcourts.gov | 25 | ||
3dca.flcourts.gov | 24 | ||
www.supremecourt.ohio.gov | 20 | Missmatch across counties | Docket number is the same '22CA15' Doc 1, Doc 2 |
www.azcourts.gov | 11 | Mixing up Criminal and Civil docket numbers | '1 CA-CR 23-0297' and '1 CA-CV 23-0297-FC' are matched |
There are also some one-off mix ups. This one is due to a merger with harvard and lawbox
https://www.courtlistener.com/api/rest/v3/dockets/1558364/
Original: In Re Pauley
New : Travis Norwood v. Jonathan Frame, Superintendent, Mount Olive Correctional Complex and Jail
This one was caused by a typo on the scraped web page, where they put a 22 instead of a 20
https://www.courtlistener.com/api/rest/v3/dockets/67836231/
Original: 1417 Belmont Community Dev., LLC v. District of Columbia
New : Lynch v. Ghaida
After fixing docket matching, we should find a way to separate the clusters mixed by this error. Hopefully, it is limited to the courts on the above table
Assuming the matching problem is solved, we could decide to update the case name based on the length of the names. Sometimes newer case names are shorter; sometimes longer; and I think longer case names have more information by having the fuller party names
Examples of updates where the names are worse
https://www.courtlistener.com/api/rest/v3/dockets/68730521/
Original: Kevin Kulak v. Itshak On
New : Kulak v. Itshak On
=========================
https://www.courtlistener.com/api/rest/v3/dockets/2615014/
Original: State of Delaware v. Hobbs.
New : State v. Amir Fatir f/k/a Sterling Hobbs
=========================
https://www.courtlistener.com/api/rest/v3/dockets/66774469/
Original: Sunil M. Malkani v. Gemma Cunningham
New : Malkani v. Cunningham
Examples of updates where the names are better:
https://www.courtlistener.com/api/rest/v3/dockets/68437417/
Original: Overwell Harvest, Limited v. Trading Technologies Internati
New : Overwell Harvest, Limited v. Trading Technologies International, Inc.
=========================
https://www.courtlistener.com/api/rest/v3/dockets/68454533/
Original: Kalispell v. Diablo Investments
New : City of Kalispell v. Diablo Investments
=========================
https://www.courtlistener.com/api/rest/v3/dockets/68941229/
Original: Matter of M.N., YINC
New : Matter of M.N. and M.N., Youths in Need of Care.
Related, we could improve the case name parsing for these courts:
https://www.courtlistener.com/api/rest/v3/dockets/68561913/
Original: Ex parte The Housing Authority of the City of Talladega. PETITION FOR WRIT OF CERTIORARI TO THE COURT OF CIVIL APPEALS (In re: Harold Wallace v. The Housing Authority of the City of Talladega) (Talladega Circuit Court: CV-18-900509 Civil Appeals: 2210486).
New : Ex parte Housing Authority of the City of Talladega. PETITION FOR WRIT OF CERTIORARI TO THE COURT OF CIVIL APPEALS (In re: Harold Wallace v. The Housing Authority of the City of Talladega) (Talladega Circuit Court: CV-18-900509 Court of Civil Appeals: 2210486).
=========================
https://www.courtlistener.com/api/rest/v3/dockets/68538816/
Original: Ex parte Morgan Stanford and Matthew Hogue. PETITION FOR WRIT OF MANDAMUS: CIVIL (In re: Morgan Stanford and Matthew Hogue v. HCP Properties, LLC)(Jefferson Circuit Court: 22-901106).
New : Ex parte Morgan Stanford and Matthew Hogue. PETITION FOR WRIT OF MANDAMUS (In re: Morgan Stanford and Matthew Hogue v. HCP Properties, LLC)(Jefferson Circuit Court: 22-901106).
=========================
https://www.courtlistener.com/api/rest/v3/dockets/68206850/
Original: State v. Yuen
New : State v. Yuen. Dissenting Opinion by Recktenwald, C.J., in Which Ginoza, J., Joins. ICA Order of Correction, filed 09/26/2023 [ada]. ICA s.d.o., filed 09/22/2023 [ada]. Application for Writ of Certiorari, filed 12/18/2023. S.Ct. Order Accepting Application for Writ of Certiorari, filed 01/30/2024 [ada].
Case names end with a code
https://www.courtlistener.com/api/rest/v3/dockets/68979773/
Original: Riversiders Against Increased Taxes v. City of Riverside CA4/2
=========================
https://www.courtlistener.com/api/rest/v3/dockets/68975608/
Original: Holguin Family Ventures v. County of Ventura CA2/6
Super helpful analysis. I don't know the solution half as well as you do, but one thing I'll note is that the shorter case names tend to be the better ones, actually, but this is essentially the difference between case_name
and case_name_full
:
case_name
: Shows the simplified case name: Lissner v. Foxcase_name_full
: Shows the full case name: Michael Lissner v. Michael FoxThere's also case_name_short
, of course, which is usually just the first party: Lissner
.
An example:
Docket 68295573 already has a case_name Van Camp v. Van Camp, different than new value State v. Snyder
The docket in Courtlistener, that would have been overwritten has docket number "1 CA-CV 23-0297-FC", case name Van Camp v. Van Camp
The docket number for the Snyder case is "1 CA-CR 23-0297", which is a different case
So, the "docket_number_core" with value "230297" matches, but it shouldn't
This is a single example for Arizona, but on Sentry there are more records.
There is another example where the mismatch doesn't have a straightforward solution:
The oral argument with case name "In re: NEWMAN" has docket number 21-1228
The opinion with case name "Edgar G. C. v. Garland" has the same docket number
There are some cases when it is a correct match, but the case name or other data point is slightly different: ca3
The offending logic is in this function
https://github.com/freelawproject/courtlistener/blob/723b7ec84101b18fa2f0aa0dcb7ef7788dc74361/cl/recap/mergers.py#L84-L169
Sentry Issue: COURTLISTENER-7XG