freelawproject / courtlistener

A fully-searchable and accessible archive of court data including growing repositories of opinions, oral arguments, judges, judicial financial records, and federal filings.
https://www.courtlistener.com
Other
544 stars 150 forks source link

Normalize Party Types #757

Open mlissner opened 6 years ago

mlissner commented 6 years ago

We have an open ticket about doing better sort order for party types in #750. Before we can do that, we should see if we can normalize them. There are currently about 317 entries in the list (I think), and most are duplicates, misspellings, etc.

johnhawkinson commented 6 years ago

I would be wary.

There are probably cases where it's appropriate for CL to display a normalized party, but generally speaking I want to be able to trust RECAP to accurately represent the court's record, errors and all. I don't want to have to pay PACER fees to see if RECAP has "helpfully" "fixed" something where the court's error matters and I need to know about the court's error.

(I am also skeptical that most are meaningfully duplicates and misspellings. Different districts set up their ECF systems differently and if District A uses Plaintiff-Intervenor and District B uses Plaintiff/Intervenor, I should see A's style when I look at an A case and B's style when I look at a B case.

When building CL tools that look across districts and across cases, normalization starts to make a lot of sense. But when I look at an individual case, I want to see what PACER gave me, not what CL thinks PACER should have given me.)

mlissner commented 6 years ago

Using your example, can you explain why you'd want to see a hyphen instead of a slash?

johnhawkinson commented 6 years ago

Again, I want to see exactly what the court has provided. Exactly that. This is the law. Details matter. Trust matters.

If the court's style is to use "Plaintiff-Intervenor," that's what I want to see, and refer to, and think about. And if the court's style is to use "Plaintiff/Intervenor," then that's what I want to see.

I don't want the tool to be tampering with my view of reality or trying to "help." If you start to do this here, it erodes my confidence in the tool. And it starts down a slippery slope. I can't trust the parties reported by RECAP, then can I not trust the docket text? Can I trust that the addresses will be properly displayed for the parties (trick question: I know i can't trust that right now).

But, for instance, if the purpose of normalization is to sort based on normalized party and then display the non-normalized results in normalized order, I have no problem with that.