freelawproject / courtlistener

A fully-searchable and accessible archive of court data including growing repositories of opinions, oral arguments, judges, judicial financial records, and federal filings.
https://www.courtlistener.com
Other
544 stars 150 forks source link

Make Sankey diagrams for parties/attorneys/firms #725

Closed mlissner closed 6 years ago

mlissner commented 7 years ago

@jraller and I talked about this around Feb 14th in Slack, but the crux here is that the relationships between parties, attorneys, and firms is complicated, and best displayed as a viz.

Here's a sketch of how I imagine this working:

img_20170214_145720

And if you prefer ascii art:

party 1-------------\ 
party 2--------------+--atty1 ------------------\ 
party 3-------------/                           |
party 4-------------\                           |---Firm 1
party 5--------------+--atty2-------------------/
party 6-------------/
party 7-------------\ 
party 8--------------+--atty3-----------------------Firm 2
party 9-------------/

There's also this image that shows how multiple charts could be used for each type of party:

img_20170214_153247_720

And this image shows how the charts could be integrated into the UI:

screenshot_from_2017-02-14_15-35-00_480

Eventually, the nodes could be clickable too, so that you could click on a firm, attorney, or party and see their other stuff as a fresh webpage.

This is a corner case that has thousands of attorneys:

https://www.courtlistener.com/docket/4166723/in-re-tft-lcd-flat-panel-antitrust-litigation/

We should do this as soon as possible, if we can. We just got an email from somebody that pointed out that firm data isn't displayed anywhere on CourtListener. That's obviously pretty bad.

johnhawkinson commented 7 years ago

As diagramed above this seems to imply a party has a single attorney, which is frequently not the case.

How do law professors represent these relationships in scholarship?

mlissner commented 7 years ago

As diagramed above this seems to imply a party has a single attorney, which is frequently not the case.

I haven't seen this to be the case in PACER, but it seems solvable via the diagram.

How do law professors represent these relationships in scholarship?

That's a great question. I confess I have no idea!

johnhawkinson commented 7 years ago

I haven't seen this to be the case in PACER, but it seems solvable via the diagram.

Errr, except for pro se litigants, I can't think of case where it is not the case. Let's pick a case with you in it!:

http://ia801505.us.archive.org/0/items/gov.uscourts.dcd.178502/gov.uscourts.dcd.178502.docket.html

NATIONAL VETERANS LEGAL SERVICES PROGRAM is represented by Narwold, Smith, and Gupta. And then NATIONAL CONSUMER LAW CENTER by the same three.

Pretty much the only place it doesn't happen is pro se litigants. I guess there are solo practioners in small cases, but anybody represented by a major firm has multiple attorneys filing appearances...

I'm surprised you're not used to seeing the (See above for address)

I'm sure it's solvable, but I'm not quite sure how you would solve it. It might be good to give a sample diagram.

That's a great question. I confess I have no idea!

I think it's a trick question.

mlissner commented 7 years ago

I'm sorry, brain fart. Yes, I've seen that, and I've spent days getting the See above thing working. Brain fart.

Take a look here: http://bl.ocks.org/d3noob/c9b90689c1438f57d649

If we imagine multiple parties going into the one attorney in the middle named "Elvis" in the example, does that clarify?

johnhawkinson commented 7 years ago

If we imagine multiple parties going into the one attorney in the middle named "Elvis" in the example, does that clarify?

Not really.

That works if Elvis is an atty for Barry and Frodo, and even if Elvis is divided up into Ernie and Estelle.

But it's harder if there are 3 parties represented by Ernie and Estelle. Lines would have to cross.

And if we get into cases where the relationships are not quite regular (err, normal?), like 5 parties for which 3 attorneys represent 2 parties, 1 attorney represents 3 parties, and another attorney represents the remaining 2 parties and 1 of the prior parties.

mlissner commented 7 years ago

Lines crossing should be minimized, but it's OK if they cross a bit — you can experiment with this a bit in the link in my last comment by sliding things around.

Even in the complex case you gave, a Sankey diagram should be able to handle it fairly well, I think.

johnhawkinson commented 7 years ago

Hrmm. if you say so.

http://ia601404.us.archive.org/17/items/gov.uscourts.nysd.273913/gov.uscourts.nysd.273913.docket.html

is an example of irregular situations, where Mark Edward Avsec represents Canadian Standards Association. But Bruce P. Keller also represents them. But Keller represents other parties too (John R. Wiley) who Avsec does not represent. &c.

mlissner commented 7 years ago

Yeah, I think this is the perfect use case for the Sankey diagrams. This complex interaction of people is almost impossible to understand in the current table-based approach. In a Sankey it'd be fairly obvious (if a bit messy).

johnhawkinson commented 7 years ago

This is not my area of expertise, but it seems like:

mlissner commented 7 years ago

The primary benefit of Sankey diagrams is suppose to be an indication of flow volume based on width of the band.

Actually, we do use them for this, to an extent. If an attorney is representing a lot of clients, that seems like an interesting insight, and one that we'd show nicely. But point taken, we're not dealing with crude oil or something.

I'm totally game to try different approaches to this. The idea is to show the relationships of the people in a way that makes it easier to see who's connected to whom. It could be Sankey, it could be a node and spoke network, I'm game if there's another way forward that makes more sense. We sort of backed into Sankey after doing some sketches and noticing that it seemed like a nice fit.

johnhawkinson commented 7 years ago

Actually, we do use them for this, to an extent. If an attorney is representing a lot of clients, that seems like an interesting insight, and one that we'd show nicely. But point taken, we're not dealing with crude oil or something.

I would challenge this. Within the context of a single case (which is what I think we're talking about), number of clients is not generally an indication of merit.

For instance, let's take a possibly-typical case I intervened in: http://ia601509.us.archive.org/28/items/gov.uscourts.mad.189311/gov.uscourts.mad.189311.docket.html.

Petitioner sued five government agents (2 officials of a local sherif's dept, and 3 federal officials: director of ICE and ICE's Boston Field office, and also the DHS secretary), all of whom are represented by a single government attorney. Petitioner is represented by 3 ACLU attorneys and outside counsel for the ACLU. (And to be fair, I should s/represented by/filed an appearance in/, because we can imagine there is more going on in the DOJ about this than one attorney, even though there's only one appeared.)

It is not true that the government parties are five times more important than petitioner, because there are five times as many parties (although you did not suggest this was what the flow width would be). [And it's definitely not true that I am twice as important as the petitioner, because confusion about CM/ECF accounts lead to one with my middle initial and one without.)

It is also not true that the petitioner is 4 times as important as the government respondants because he has 4 times as many attorneys.

Using any of those as a proxy for importance would seem to be wrong many more times than it would be right.

I think trying to force Sankey to do this is going to lead you down the wrong path. It might be better for us to stop and think what the information that could be visualized here is, and then what should be done about it. It is also possible that there is no interesting visualization of the parties alone that is really worth doing -- it might be much more interesting to see visualizations of the relationships between parties among different cases (there Sankey could show a lot, as some firms or attorneys handle many more cases than others. But I'd still think that would limit the Sankey x-axis to 3, which seems like another indicator that it's not right). Or to show a visualizaiton of which parties are involved in which docket entries.

The idea is to show the relationships of the people in a way that makes it easier to see who's connected to whom. It could be Sankey, it could be a node and spoke network, I'm game if there's another way forward that makes more sense.

I do think node and spoke is a better fit. If it's boring, that may mean the data is boring.

As an exercise, I went and sat down with the D3 Gallery, something I am not super-familiar with and have not looked at recently. I didn't have great results.

I will try to sit down with my Tufte VDQI and friends and see if anything in there spurs my imagination.

Anyhow, from my review of the D3 gallery, here's what stuck out for me:

screen shot 2017-09-13 at 10 40 20 screen shot 2017-09-13 at 10 40 12
mlissner commented 7 years ago

Thanks for doing some research on this.

I do think node and spoke is a better fit. If it's boring, that may mean the data is boring.

Yeah, I could see node and spoke being a good way forward. Maybe we create three columns, and use those to align the nodes horizontally in each column. I think that'd still look roughly like:

img_20170214_153247_720

That'd be a pretty minimal design and would be something between the chaos of most of the graph vizes (which I find to be fairly useless although fun), and the Sankey's (which you rightly point out can give emphasis where it doesn't belong).

The other benefit (from a tech perspective) is that this could probably use graphing techniques like we use in our SCOTUS visualizations: https://www.courtlistener.com/visualizations/scotus-mapper/.

johnhawkinson commented 7 years ago

One more observation. The filing side of CM/ECF diagrams parties in a hierarchical tree with expand/contract buttons. Shown here after clicking Expand All.

screen shot 2017-09-21 at 13 30 27

I don't propose we do it this way, but it's helpful to give examples of the prior art for this display and that's a significant one, esp. since it's how the system itself represents it in its UI.

mlissner commented 7 years ago

That's great, thanks John.

I like that this also shows the annoying thing about that kind of an approach. Check out Kathleen Connolly. She's all over the place. A better viz could show all of her relationships together in one place without repetition.

saizai commented 7 years ago

Related to this would be conflict checking — has an attorney, or their firm, or their previous firms, ever represented a party adverse to the party they are now representing?

(That's not a perfect determination, because you don't know whether they were firewalled appropriately to avoid this, but it's a good flag.)

mlissner commented 6 years ago

I'm actually going to close this as a wontfix. Four reasons.

  1. I don't think I understand the use case of the parties page well enough to know how well any visualization would work. There's still a lot of duplication on this page, and that bugs me a lot, but I don't know exactly how people are looking at this information. Maybe it doesn't bug them. Right now it's very easy to answer the question, "Who's the lawyer for this party?", but it's hard to know "How many people is that lawyer representing?" and other questions of that sort. Sankey (or another network-style graph) would fix that. But I'm not sure people are asking that.

  2. This data is super dirty and complex. I did a lot of the work for the Sankey diagram, but stopped part way through to see what the data looked like in a traditional party listing. There's a lot of complexity and a lot of dirty data. Getting that into a viz will be hard.

  3. Visualizations are hard and hard to maintain, and we have a lot of priorities. Should this be one of them? I'm not sure.

  4. After doing the traditional layout, I'm actually pretty happy with it. I'm less convinced a Sankey is needed.

I'll post a useful diff to pick up this work again in a moment.