Closed mlissner closed 7 years ago
The type field on each party member can be one of 95 values so far (based on a fairly random sample of 1000 items):
<type>2255</type>
<type>ADR Provider</type>
<type>All Plaintiffs</type>
<type>Amicus</type>
<type>Appellant</type>
<type>Appellee</type>
<type>Assistant U.S. Trustee</type>
<type>Assist. U.S. Trustee</type>
<type>Asst. U.S. Trustee</type>
<type>Bankruptcy Admin.</type>
<type>Claimant</type>
<type>Claims Agent</type>
<type>Claims and Noticing Agent</type>
<type>Consol Defendant</type>
<type>Consolidated Defendant</type>
<type>Consolidated Plaintiff</type>
<type>Consol Plaintiff</type>
<type>Counter-claimant</type>
<type>Counter Claimant</type>
<type>Counter-Claimant</type>
<type>Counter-defendant</type>
<type>Counter Defendant</type>
<type>Counter-Defendant</type>
<type>Cred. Comm. Chair</type>
<type>Creditor Committee</type>
<type>Creditor</type>
<type>Cross Appellant</type>
<type>Cross-claimant</type>
<type>Cross Claimant</type>
<type>Cross-Claimant</type>
<type>Cross-defendant</type>
<type>Cross Defendant</type>
<type>Debtor 1</type>
<type>Debtor 2</type>
<type>Debtor Designee</type>
<type>Debtor-in-Possess</type>
<type>Debtor</type>
<type>Defendant (1)</type>
<type>Defendant (2)</type>
<type>Defendant (3)</type>
<type>Defendant (4)</type>
<type>Defendant (5)</type>
<type>Defendant (6)</type>
<type>Defendant (8)</type>
<type>Defendant Consolidated</type>
<type>Defendant-in-Rem</type>
<type>Defendant</type>
<type>executor plaintiff</type>
<type>Foreign Representative</type>
<type>FourthParty Plaintiff</type>
<type>Garnishee</type>
<type>In re Debtor</type>
<type>In Re</type>
<type>Interested Party</type>
<type>Interim Trustee</type>
<type>Interpleader</type>
<type>Intervenor Defendant</type>
<type>Intervenor Plaintiff</type>
<type>Intervenor-Plaintiff</type>
<type>Intervenor Pla</type>
<type>Intervenor</type>
<type>Joint Debtor</type>
<type>Jointly Administered Debtor</type>
<type>Lead Plaintiff</type>
<type>Liquidating Trustee</type>
<type>Liquidating Trust</type>
<type>Liquidator</type>
<type>Mediator (ADR Panel)</type>
<type>Mediator</type>
<type>Miscellaneous</type>
<type>Movant</type>
<type>Nominal Defendant</type>
<type>Objector</type>
<type>Other Party</type>
<type>Patient Care Ombudsman</type>
<type>Petitioner</type>
<type>Petitioning Creditor</type>
<type>Plaintiff - Consolidated</type>
<type>Plaintiff Consolidated</type>
<type>Plaintiff</type>
<type>Receiver</type>
<type>Respondent</type>
<type>Special Master</type>
<type>Successor Trustee</type>
<type>Technical Advisor</type>
<type>Third Party Counter Claimant</type>
<type>Third Party Counter Defendant</type>
<type>Third Party Defendant</type>
<type>ThirdParty Defendant</type>
<type>Third Party Plaintiff</type>
<type>ThirdParty Plaintiff</type>
<type>Trustee's Attorney</type>
<type>Trustee</type>
<type>U.S. Trustee</type>
<type>US Trustee</type>
This might be a place to start when it comes to reconciling party names. https://api.opencorporates.com/documentation/API-Reference
Doing some research on this tonight:
We can load the dockets as normal but with a section for the parties that requires a click through. When clicked it does an ajax request, which returns the data. The X-Robots-Tag
is set on the HTTP tag of the data to ensure it doesn't show up in search.
Normalization is a punt for now. First step, regardless, will be to import the raw data. From there we can start doing normalization. This may be a place to start: https://opendata.stackexchange.com/questions/115/are-there-any-good-libraries-available-for-doing-normalization-of-company-names
Until this is normalized, no need for each party to have a URL.
So, general approach will be:
Later:
So here's the plan for organizing the party information. It's more complicated than I thought at first:
Docket
|
Party Type (e.g. Defendant, Plaintiff, etc)
|
Party Table
- Name (Michael Lissner, Wal*Mart, etc.)
- Extra Info (This seems to be addresses)
|
Role (e.g. Lead Attorney)
|
Attorney
- Name (Brian Carver)
- Contact_raw
- Phone number
|
AttorneyOrganization
- Name (Sierra Club)
- Address (22 Bear St., Yosemite, CA)
So, in English this means:
I'm dropping fax numbers. Screw that.
OK, and this parse is complete! Some initial numbers:
The word "Trump" occurs in 277 different parties.
Next steps are:
This launched today, so I'm calling it done. I'll open a new ticket with info about adding a front end for parties/attys/firms.
To do this, we'll need to do a re-parse of the RECAP XML files, pulling out the parties as we go. Questions:
How do we show this in the UI? I hate the way most UIs for this show an extremely long list of parties before you get to the useful documents. We also decided we didn't want this information to be crawled. That means it should probably be behind an XHR request.
Can we normalize this data somehow? IBM == International Business Machines.
Do we want a URL for every party?
How do we make this searchable in the UI?