freelawproject / courtlistener

A fully-searchable and accessible archive of court data including growing repositories of opinions, oral arguments, judges, judicial financial records, and federal filings.
https://www.courtlistener.com
Other
541 stars 149 forks source link

Link to administrative agency dockets #4228

Closed v-anne closed 2 months ago

v-anne commented 2 months ago

Some administrative agencies like the NLRB have dockets for their proceedings.

It would be great if RECAP could link to them.

Here's an example case this could benefit, with the relevant modification highlighted. https://www.courtlistener.com/docket/68549109/apple-v-nlrb/

image

I think cl/opinion_page/templates/docket_tabs.html is the relevant file to modify. Please correct me if wrong.

`{% if og_info %}

Originating Court Information

  <hr class="top">
  {% if docket.appeal_from or docket.appeal_from_str %}
    <span class="meta-data-header">Appealed From:</span>
    <span class="meta-data-value">
      {% if docket.appeal_from %}
        {{ docket.appeal_from.short_name }}
      {% elif docket.appeal_from_str %}
        {{ docket.appeal_from_str }}
      {% endif %}
        {% if og_info.docket_number %}
          {% if docket.appeal_from %}
            (<a
            href="/?type=r&amp;docket_number={{ og_info.docket_number }}&amp;court={{ docket.appeal_from.pk }}"
            rel="nofollow"
            data-toggle="tooltip"
            data-placement="right"
            title="Search for this docket number in the RECAP Archive.">{{ og_info.docket_number }})</a>
          {% else %}
            {% if 'NLRB' in docket.appeal_from_str %}
              (<a href="https://www.nlrb.gov/case/{{ og_info.docket_number }}"
                  target="_blank"
                  rel="nofollow"
                  data-toggle="tooltip"
                  data-placement="right"
                  title="View this case on the NLRB website.">{{ og_info.docket_number }}</a>)
             {% else %}
            ({{ og_info.docket_number }})
            {% endif %}
          {% endif %}
        {% endif %}
    </span>
  {% endif %}`
mlissner commented 2 months ago

That's a fun idea. +1 from me, but it does seem hard to get right. If you're interested in helping with this, we could export a list of the "Appealed From" values so you could analyze them and make whatever links are possible?

v-anne commented 2 months ago

Yes, please.

I think the following agencies are very feasible:

Others might be a bit more difficult:

If you export the data, I can try to tackle the agencies one at a time.

mlissner commented 2 months ago

Cool! How comfortable do you feel building this feature? Do you need help looking through the code, understanding django, things like that? Do you know where you'd hook this into our view, for example?

v-anne commented 2 months ago

I have some background with django and am willing to tool around with the codebase. I just had some trouble figuring out what view is responsible for the dockets. I think I found the right html file though, right?

mlissner commented 2 months ago

Yeah, that HTML looks right. The view for the docket is cl.opinion_page.views.view_docket. Sorry, I know that doesn't make much sense.

I'd have to think some more about it, but it sort of feels like a custom template tag could do the trick instead of doing it in the view, but I'm sure when you get into it you'll have a sense of what the better solution is.

v-anne commented 2 months ago

Got it, I'll take a look. Do you mind sharing a sample of the "Appealed From" values?

mlissner commented 2 months ago

Definitely. @ERosendo, do you think you could export some values from the dev DB for @v-anne to look at and experiment with?

ERosendo commented 2 months ago

Sure!

This CSV file includes a list of distinct values extracted from the appeal_from and appeal_from_str columns within the docket table of the dev DB:

docket-appeal_from.csv

v-anne commented 2 months ago

Thanks, I'm looking at this now.

What does appeal_from_id correspond to? I ask because the NLRB and FCC are both coded as dcd, but that seems to be the DC federal district court.

mlissner commented 2 months ago

That should be the court ID where these appealed from. Maybe those cases are tried in DCD before they get appealed?

v-anne commented 2 months ago

docket-appeal_from.csv I think I eliminated all of the district courts and bankruptcy courts from the csv file. Most of those left should be administrative tribunals.

However, you'll see in same cases (like lines 199 and 200) the agency is listed multiple times, but with different appeal_from_id values. Do you think that's just miscoding? I don't think the NLRB has trials in the DC district court.

What I'm trying to say is, it might be better for me to just focus on appeal_from_str when trying to link to the court dockets.

mlissner commented 2 months ago

However, you'll see in same cases (like lines 199 and 200) the agency is listed multiple times, but with different appeal_from_id values. Do you think that's just miscoding? I don't think the NLRB has trials in the DC district court.

That's a bit odd. It might be helpful to pull that case from PACER and see what's up with it.

But yes, generally, I think you should just focus on the ones where the appeal_from_id is null. Those should indicate that we couldn't figure out the court value, which is a pretty strong hint that it's some other kind of entity.

v-anne commented 2 months ago

Ok, that is feasible. Do you mind pulling example cases for each of the appeal_from_str values I narrowed down so that I can see what the case number formats are?

mlissner commented 2 months ago

Can you help with this please, @ERosendo?

ERosendo commented 2 months ago

Sure!

ERosendo commented 2 months ago

@v-anne Here's a sample set of dockets for each value of the appeal_from_str field included in the CVS file you provided,

Each row in the CSV contains the following information:

dockets.csv

v-anne commented 2 months ago

Thanks @ERosendo. Sorry to bug you, but I think I need one more column. If you look at the screenshot at the beginning of the thread, could also provide the administrative agency/lower court case number for each of these cases? Right now only the circuit court case number is provided.

ERosendo commented 2 months ago

Hey @v-anne, here's the new csv file. It contains the same fields as the previous file, including the requested case number as originating_docket_number.

data-dockets.csv

v-anne commented 2 months ago

Thanks, @ERosendo. @mlissner, I think this might take longer than I initially expected. The data isn't as clean as I had hoped so it will need some regex work.

mlissner commented 2 months ago

Oh shoot. I guess that's not a huge surprise, but I appreciate you sticking with it.

Would it make sense to find the most common pattern or two and land those as MVPs before going after the many edge cases?

v-anne commented 2 months ago

Yeah, I think there are two candidates for MVPs.

  1. The NLRB one I first suggested is relatively simple. Some regex might be needed to convert to the valid format on a case by case basis, but it's workable.
  2. Some challenges to EPA regulations list the originating_docket_number as a Federal Register citation. This is even easier to accommodate, but it won't cover every EPA appeal as many other cases don't use this syntax.
v-anne commented 2 months ago

Here's #2.



def create_fr_hyperlink(citation):
    # Remove commas from the input
    citation_no_commas = citation.replace(',', '')

    # Regular expression to match both ##FR##### and ## Fed. Reg. ##### patterns
    pattern = r'(\d{1,3})\s*(?:(?:[Ff][Rr])|(?:[Ff]ed\.?\s*[Rr]eg\.?))\s*(\d{1,5})'

    match = re.search(pattern, citation_no_commas)

    if match:
        volume = match.group(1)
        page = match.group(2)

        url = f"https://www.federalregister.gov/citation/{volume}-FR-{page}"

        return url```
v-anne commented 2 months ago

And here's #1.



def format_case_number(case_number):

    # Extract components using regex
    match = re.match(r'^(\d{1,2})-?([A-Z]{2})-?(\d{1,6})$', case_number)

    if match:
        region, case_type, number = match.groups()

        # Pad region and number with zeros
        region = region.zfill(2)
        number = number.zfill(6)

        url = f"https://www.nlrb.gov/case/{region}-{case_type}-{number}"

        return url```
mlissner commented 2 months ago

We're on our way. Seeing these, I wonder, would you be game to write some tests along with them?

I assume for v1 we'll want a single function. Something like:

def linkify_orig_docket_number(og_docket_number: str) -> str:
    """Make an originating docket number for an appellate case into a link

    :param og_docket_number: The docket number where the case was originally heard.
    :returns: A linkified version of the docket number for the user to click on.
    """
    ...

If we have that, we could just have some basic tests that send in a value and then check that output link is correct.

One other fun (?) thought: Security. We'll want to be careful here that we don't allow an input that could allow cross-site scripting or other kinds of injection. Fun?

v-anne commented 2 months ago
import re

def linkify_orig_docket_number(agency: str, og_docket_number: str) -> str:
    """Make an originating docket number for an appellate case into a link (MVP version)

    :param agency: The administrative agency the case originated from
    :param og_docket_number: The docket number where the case was originally heard.
    :returns: A linkified version of the docket number for the user to click on, or the original if no link can be made.
    """
    # Simple pattern for Federal Register citations
    fr_match = re.search(r'(\d{1,3})\s*(?:FR|Fed\.?\s*Reg\.?)\s*(\d{1,5})', og_docket_number)
    if fr_match:
        volume, page = fr_match.groups()
        return f"https://www.federalregister.gov/citation/{volume}-FR-{page}"

    # NLRB pattern
    if agency == 'National Labor Relations Board':
        match = re.match(r'^(?:NLRB-)?(\d{1,2})-?([A-Z]{2})-?(\d{1,6})$', og_docket_number)
        if match:
            region, case_type, number = match.groups()
            formatted_number = f"{region.zfill(2)}-{case_type}-{number.zfill(6)}"
            return f"https://www.nlrb.gov/case/{formatted_number}"

    """Add other agencies as feasible. Note that the Federal Register link should cover multiple agencies.
    """
    # If no match is found, return the original docket number
    return og_docket_number

@mlissner, this is what I have at the moment. If this seems ok I'll try to approach unit tests. Admittedly, I'm not that familiar with trying to avoid XSS. Any tips?

Sorry about the formatting issues as well, I'm having problems with the code blocks.

mlissner commented 2 months ago

Code looks pretty good to me, at a glance. For tests, maybe @ERosendo can chime in with an example that you can copy?

For XSS, I think django should protect us by default, but let's double check that before releasing the code. Definitely something we can return to.

Sorry about the formatting issues as well, I'm having problems with the code blocks.

Np. You're looking for the triple backtick. I edited your post to show you.

v-anne commented 2 months ago

@ERosendo, mind sharing some tests?

ERosendo commented 2 months ago

Hi @v-anne, I apologize for the delay.

I think the following test aligns with the code you provided:

https://github.com/freelawproject/courtlistener/blob/723b7ec84101b18fa2f0aa0dcb7ef7788dc74361/cl/citations/tests.py#L1240-L1276

This test uses a structured approach to comprehensively evaluate the behavior of a helper function named clean_parenthetical_text across a diverse range of input scenarios

The test has two key elements:

Let me know if you have any questions

v-anne commented 2 months ago

No worries, @ERosendo.

Here is what I have for tests:


 def test_linkify_orig_docket_number(self):
      test_pairs = [
          (
              "National Labor Relations Board",
              "19-CA-289275",
              "https://www.nlrb.gov/case/19-CA-289275"
          ),
          (
              "National Labor Relations Board",
              "NLRB-09CA110508",
              "https://www.nlrb.gov/case/09-CA-110508"
          ),
          (
              "EPA",
              "85 FR 20688",
              "https://www.federalregister.gov/citation/85-FR-20688"
          ),
          (
              "Other Agency",
              "85 Fed. Reg. 12345",
              "https://www.federalregister.gov/citation/85-FR-12345"
          ),
          (
              "National Labor Relations Board",
              "85 Fed. Reg. 12345",
              "https://www.federalregister.gov/citation/85-FR-12345"
          ),
          (
              "Bureau of Land Managemnet",
              "88FR20688",
              "https://www.federalregister.gov/citation/88-FR-20688"
          ),
          (
              "Bureau of Land Managemnet",
              "88 Fed Reg 34523",
              "https://www.federalregister.gov/citation/88-FR-34523"
          ),
          (
              "Federal Communications Commission",
              "19-CA-289275",
              "19-CA-289275"
          ),
          (
              "National Labor Relations Board",
              "This is not an NLRB case",
              "This is not an NLRB case"
          ),
          (
              "Other Agency",
              "This is not a Federal Register citation",
              "This is not a Federal Register citation"
          ),
      ]

      for i, (agency, docket_number, expected_output) in enumerate(test_pairs):
          with self.subTest( 
           f"Testing description text cleaning for {agency, docket_number}...", i=i 
       ): 
           self.assertEqual( 
               linkify_orig_docket_number(agency, docket_number), 
               expected_output, 
               f"Got incorrect result from clean_parenthetical_text for text: {agency, docket_number}", 
           )

Let me know what else you need, @mlissner. I'm not quite sure what file you'd like to put this in.

mlissner commented 2 months ago

This looks great, thank you!

@ERosendo, do you have a suggestion for where this ought to live? And where the parser would make sense?

ERosendo commented 2 months ago

Hi 👋

do you have a suggestion for where this ought to live?

I recommend placing the linkify_orig_docket_number helper method within the model_helper.py file. Here's why this location makes sense:

Corresponding tests can be placed within the tests.py file located in the same folder.

where the parser would make sense?

I believe it's a good idea to add the parser function as a property of the OriginatingCourtInformation class. This maintains code organization, provides direct access to necessary data, and allows reusability. Consider the following implementation (the Docket class offers additional examples of Model Properties):


from cl.lib.model_helpers import linkify_orig_docket_number

class OriginatingCourtInformation(AbstractDateTimeModel):
    ...

    @property
    def administrative_link(self):
        return linkify_orig_docket_number(self.docket.appeal_from_str, self.docket_number)

To access the generated administrative link within the docket_tabs.html template, you would use the following syntax:

{{ og_info.administrative_link }}

@v-anne Please let me know if you need any additional information.

v-anne commented 2 months ago

@ERosendo, thanks for your help. I've made all of those changes locally and will make a PR soon, but have one (hopefully last) issue. Below is the html I've modified. I'm not sure how this html will handle the case where there is no link to an administrative docket. The python function returns the original case number that was input, but the html here doesn't account for that case and I'm not sure how to best handle it.


{% with og_info=docket.originating_court_information %}
    {% if og_info %}
      <h3 class="v-offset-above-3">Originating Court Information</h3>
      <hr class="top">
      {% if docket.appeal_from or docket.appeal_from_str %}
        <span class="meta-data-header">Appealed From:</span>
        <span class="meta-data-value">
          {% if docket.appeal_from %}
            {{ docket.appeal_from.short_name }}
          {% elif docket.appeal_from_str %}
            {{ docket.appeal_from_str }}
          {% endif %}
            {% if og_info.docket_number %}
              {% if docket.appeal_from %}
                (<a
                href="/?type=r&amp;docket_number={{ og_info.docket_number }}&amp;court={{ docket.appeal_from.pk }}"
                rel="nofollow"
                data-toggle="tooltip"
                data-placement="right"
                title="Search for this docket number in the RECAP Archive.">{{ og_info.docket_number }})</a>
              {% else %}
                (<a href="{{ og_info.administrative_link }}">{{ og_info.docket_number }}</a>)
              {% endif %}
            {% endif %}
        </span>
      {% endif %}
ERosendo commented 2 months ago

Hey @v-anne 👋

I'm not sure how this html will handle the case where there is no link to an administrative docket. The python function returns the original case number that was input

To simplify the template logic, we can tweak the python function to return a falsy value (like an empty string or None) instead of the original case number when it can't create a valid link. This will allow us to use a conditional in the template to only render the link when it exists.

Here's how the updated template would look:


{% with og_info=docket.originating_court_information %}
    {% if og_info %}
      <h3 class="v-offset-above-3">Originating Court Information</h3>
      <hr class="top">
      {% if docket.appeal_from or docket.appeal_from_str %}
        <span class="meta-data-header">Appealed From:</span>
        <span class="meta-data-value">
          {% if docket.appeal_from %}
            {{ docket.appeal_from.short_name }}
          {% elif docket.appeal_from_str %}
            {{ docket.appeal_from_str }}
          {% endif %}
            {% if og_info.docket_number %}
              {% if docket.appeal_from %}
                (<a
                href="/?type=r&amp;docket_number={{ og_info.docket_number }}&amp;court={{ docket.appeal_from.pk }}"
                rel="nofollow"
                data-toggle="tooltip"
                data-placement="right"
                title="Search for this docket number in the RECAP Archive.">{{ og_info.docket_number }})</a>
              {% elif og_info.administrative_link %}
                (<a href="{{ og_info.administrative_link }}">{{ og_info.docket_number }}</a>)
              {% else %}
                ({{ og_info.docket_number }})
              {% endif %}
            {% endif %}
        </span>
      {% endif %}
v-anne commented 2 months ago

Thanks for the suggestion, @ERosendo. I made changes to the html and the function/tests accordingly.

I'm a bit embarrassed to say I'm having issues with pushing my changes.

remote: Permission to freelawproject/courtlistener.git denied to v-anne.
fatal: unable to access 'https://github.com/freelawproject/courtlistener.git/': The requested URL returned error: 403

I got the following error after making a new branch locally for this issue. To recap, I cloned the repo locally, made the changes on a new branch, committed them, and attempted to push before getting that error message. Any advice, @mlissner?

mlissner commented 2 months ago

Yeah, this is a common mistake. The way to do this stuff these days is to make a fork, then a branch in your fork. From there you can push (it's your repo), and you can create a pull request in CourtListener to pull from your fork (once you push the changes, this part is usually pretty easy).

Want to give that a shot? The reason you don't have access is because we only allow staff members push access to CourtListener itself.

v-anne commented 2 months ago

Ok, I submitted a PR.

mlissner commented 2 months ago

Just tried it out. Super cool, @v-anne! Thank you for this great little enhancement!

v-anne commented 2 months ago

Happy to contribute. I've caught a small regex issue with this case, so I'll have to test out a fix soon. https://www.courtlistener.com/docket/68527293/airlines-for-amer-v-dept-of-trans/