freelawproject / reporters-db

A database of court reporters, tests and other experiments
BSD 2-Clause "Simplified" License
90 stars 31 forks source link

How to support Louisiana docket numbers as citations #62

Open flooie opened 3 years ago

flooie commented 3 years ago

I'm having a bit of an existential crisis regarding parsing Louisiana citations. Louisiana, long known for being an oddball legal state, is unique with its citations.

While parsing citations from Louisiana from a large dataset, I found myself not identifying thousands of Louisiana citations so I did a little digging.

Background

Standard Format

In general a citation in Lousiana Appellate cases looks like this

Herff Jones, Inc. v. Girouard, 07-393, p. 2 (La. App. 3d Cir. 10/3/07), 966 So. 2d 1127, 1130, writs denied, 07-2463, 2464 (La. 2/15/08), 976 So. 2d 185.

Taking out just the neutral Louisiana citation we get 07-393, p. 2 (La. App. 3d Cir. 10/3/07) or without a pincite 07-393 (La. App. 3d Cir. 10/3/07)

The format becomes two-digit year + hyphen + filing order (with leading zeroes dropped) with the court and date in parentheses.

Odd 2 Cir.

This is --- unless the court is the La. App. 2 Cir. in which case they drop the two digit year and drop the hyphen and use a comma for thousand separators.

Campbell v. Webster Parish Police Jury, 36,391, 36,392, p. 8 (La. App. 2d Cir. 9/18/02), 828 So. 2d 170, 175.

In this case - two dockets were combined - but a simplified citation would follow this 36,391 (La. App. 2d Cir. 9/18/02)

Other variations

To make matters a little more confusing the dataset often includes the full docket number something like 2007-CA-00393 in the citation ... like 2007-CA-00393 (La. App. 3d Cir. 10/3/07) or 07 CA 00393 (La. App. 3 Cir. 10/3/07)

There appears to be bleed over from the 2nd circuit format into the other format, and vice versa.

So...

Currently we parse only a handful of the variations, but we assign volume to the 2 or 4 digit docket number year, and page to the docket filing order. This does make some sense but is a bit weird because the page for each of these should all start with 1. Also - we have no way to assign a volume (consistently) for the 2nd circuit because its not provided and the date filed isn't always the same year.

This long background leads me to a couple of questions

Some non-reporters-db questions that are related.

jcushman commented 3 years ago

Nice writeup!

Should the reporters-db (with eyecite) parse citation strings into "proper" citation.

Hmm ... I guess if you want to link everything up properly you would have to. Hard to think of an elegant way to do it, though, because it feels like it requires python code rather than a declarative data format, right? I think you end up needing a file in eyecite of special case formatting functions per-reporter to apply inside CitationBase.corrected_citation(). Maybe as we collected special cases we could figure out some sort of declarative language for them so we could move them back to data in reporters-db.

Should we be mislabeling docket filing order as page number (kind of the practice for neutral citations I guess).

For what it's worth I'm pretty sure that eyecite doesn't care at this point if citation regexes include a volume or page capture group at all. If it was more natural to capture the whole thing as docket_number instead, for example, I think eyecite would work OK. (And downstream code has to be prepared for this anyway if it wants to work with statutes, which often don't have page groups.) But I also think it's not too bad to capture year as volume and docket order as page. As you say, that feels pretty natural with other neutral cites like 1999-Ohio-00001, and this is at least consistent with that.

is practice of certain judges enough to justify a variation...

Yeah, in general I think a variation just means "this is a way things were done by someone at some point." In practice we might not get around to adding all of them, but especially for older cases there's a ton of variation that is worth including.

flooie commented 3 years ago

This is very helpful. Thanks for the quick reply @jcushman.

Just to clarify one point, I was contemplating using the regex patterns to "cleanup" which I think would still work here. For example (using a partial regex pattern for LA), instead of using (?P<volume>\d{2,4}) one could do something like this (\d{2})?(?P<volume>\d{2}) It would truncate the volume identifier while still parsing out the citation?

I'm not saying I like it, I'm mostly spitballing.

mlissner commented 3 years ago

Yeah, tricky. Normally I'd look at this, note that it's a docket number, say it's not a citation, and urge us to move on. Alas, they're using them as citations and I think the point @flooie is making is that they're effectively neutral citations. Something we try to support. But dang they're being weird and inconsistent.

Should we be mislabeling docket filing order as page number (kind of the practice for neutral citations I guess).

This feels fine-enough to me.

Its noted that some judges just do what they want and use four digit years in opinions -is practice of certain judges enough to justify a variation...

Agree with @jcushman.

How do we handle multiple docket numbers in the citation - like the first example above?

I kind of feel like this is another reason to say sorry to the Louisiana folks, particularly if there is typically a parallel citation being used, as in your examples. I think since we have the docket number in our docket table (and we've seen that docket numbers are always a mess), I'm kind of OK just leaving these out of eyecite for the time being and to not catch these.

This question tilts me in that direction too:

Should the reporters-db (with eyecite) parse citation strings into "proper" citation.

Finally:

If lexis, west etc. have lots of incorrect citation formats in their system, does that make them worth citing as a variation in reporters-db or should we fix their formats and then add them to our system.

¿Por qué no los dos? I think we should add variations for any that occur more than a couple times, and then correct them as we put them in our system. That's been our historical approach.

It's another question if the citation is wrong. I think we saw one of those awhile back, and it was indeed a epistemological question.