freelawproject / foresight

Where we discuss and prioritize new features
2 stars 1 forks source link

Create RECAP Pray and Pay Project so people can work together to buy PACER documents #65

Open mlissner opened 4 years ago

mlissner commented 4 years ago

The idea here is to accomplish a few interesting things:

  1. Create a public list somewhere of documents people want downloaded from PACER

  2. Allow people to say they want something bought on PACER (and email them when we get it)

  3. Demonstrate how much people want documents

Doing this should be pretty easy:

  1. Create a new page explaining this initiative. It'll be called the PACER Pray and Pay Project ā€” "You pray for a document, they pay for it." I'm somewhat anti-religious, so I don't like religion entering non-religious places, but I think I can get behind this anyway. People "pray" for relief, right?

  2. On every document, we add a little prayer hands emoji (šŸ™) next to the "Buy on PACER" button. When people click the button, they indicate that they want the doc, and it gets a vote. The number of people voting for a document can be displayed before/after somebody clicks for it.

  3. Once somebody prays for a doc, they get signed up to get an email once it is in the system.

  4. We have a page that shows the most wanted documents in real time. Maybe it's filterable by jurisdiction? It shows the number of votes for a document, the description of the doc, etc.

  5. We could go figure out the price of something (by using the attachment page) whenever somebody votes for it.

  6. People only get so many prayers per day, so they don't just vote like crazy people. 20/day as a starting point? As their prayers are granted, they get more. So if they pray for 20 items, and they're all downloaded, great, give them more.

What else?

mlissner commented 4 years ago

I think putting RECAP in the name of the project is probably wise for branding; changing title accordingly.

mlissner commented 4 years ago

The alerts on this will be a pleasure: "Your prayer has been granted: Doc XYZ is now available."

mlissner commented 4 years ago

This data model should land later today:

class Prayer(models.Model):
    WAITING = 1
    GRANTED = 2
    STATUSES = (
        (GRANTED, "Prayer has been granted."),
        (WAITING, "Still waiting for the document."),
    )
    date_created = models.DateTimeField(
        help_text="The time when this item was created",
        auto_now_add=True,
        db_index=True,
    )
    user = models.ForeignKey(
        User,
        help_text="The user that made the prayer",
        related_name="prayers",
        on_delete=models.CASCADE,
    )
    status = models.SmallIntegerField(
        help_text="Whether the prayer has been granted or is still waiting.",
        choices=STATUSES,
        default=WAITING,
    )
    # Generic b/c we may want to pray for items in other jurisdictions someday
    content_type = models.ForeignKey(ContentType, on_delete=models.CASCADE)
    object_id = models.PositiveIntegerField()
    content_object = GenericForeignKey()

    class Meta:
        index_together = (
            # When adding a new document, we'll ask: What outstanding prayers
            # do we have for item Foo of type XYZ?
            # When loading the prayer leader board, we'll ask: Which items of
            # type XYZ have the most outstanding prayers?
            ("content_object", "object_id", "status"),
            # When loading docket pages, we'll ask (hundreds of times): Did
            # user ABC vote on item Foo of type XYZ?
            ("content_object", "object_id", "user"),
            # When a user votes, we'll ask: How many outstanding prayers did
            # user ABC make today?
            ("date_created", "user", "status"),
        )

We can tweak it later, but I'm hopeful I can get it right the first time. Comments welcome.

mlissner commented 4 years ago

The database part of this is now in place. Honestly, that's the hard part. The rest should be pretty easy, I think!

arderyp commented 4 years ago

lol, love the idea, and the name, well done

nemobis commented 2 years ago

What else?

It would be nice if the RECAP extension kept an estimate of the future PACER bill. Sometimes I download too much and go over my budget, because PACER provides correct numbers only after several days. If you advertise this project to individuals, many will join hoping to stay under the 30 $ per quarter and a portion of these will end up having bad surprises.

mlissner commented 2 years ago

I once had an idea for a service that monitored PACER usage by using your credentials. Still would be a fun one to build. I think doing it via the extension would be hard, but it's not a bad idea. Feel free to open an issue over in freelawproject/recap, if you like. :)

nk9 commented 1 year ago

The end of the quarter is nigh, and it ocurred to me that I once again would love a "wishlist" I could put my spare credit towards. Sounds like this is that feature.

mlissner commented 1 year ago

It is...we need to build the darned thing!

v-anne commented 7 months ago

Has anything come of this? I had the same idea recently and think it could get a lot of use.

mlissner commented 7 months ago

It's partly done, but nobody is working on it, unfortunately.

v-anne commented 6 months ago

@mlissner what remains to be done? Is it the interfacing with the Chrome extension?

mlissner commented 6 months ago

I hadn't envisioned it being part of the extension, at least not at first. I think the missing piece is adding it to the front end. The models are in place, so we just need to use HTMX to add prayer hands and a leaderboard. Probably better to do the leader board first, and then add prayer hands once it's working nicely, that way we can test it a bit before we add prayer hands to 380 million links on the site!

v-anne commented 3 months ago

@mlissner I'm willing to tackle this. I do have a suggestion though. It might be beneficial to add some sort of weighting system that prioritizes substantive documents (such as motions and discovery evidence) over less valuable documents like notices of appearances. Does courtlistener currently have any way of identifying what type of document a filing might be?

The other suggestion I have is to cap how many "prayers" any user can have open at a given time. This might not be feasible if people don't have to create courtlistener accounts to make requests.

mlissner commented 3 months ago

Awesome! This should be a fun one.

Does courtlistener currently have any way of identifying what type of document a filing might be?

No, not yet, but it's the kind of thing we'd like to do fairly soon. It's on our AI/ML backlog (which is mostly in my head).

The other suggestion I have is to cap how many "prayers" any user can have open at a given time.

I think capping the prayers is essential. Above I suggested 20/day. In hindsight, maybe three or five is better. If you do fewer, people will use them more judiciously. I suppose it should be configurable though, in any case. Probably a good thing to put in an env?

mlissner commented 3 months ago

Oops, I had more to say, sorry. Let me know if you want guidance on getting started. It'd be helpful, I think, to sketch out your general approach before beginning, I think. For example, maybe you could detail the views and URLs you expect to build and where you expect to build them, and we can provide guidance on that before you begin?

The right way to do this is probably by API, probably using django rest framework, so we'll need APIs for adding, removing, and listing prayers, I think, at a minimum? Want to put together some thoughts on this before you get into the details?

v-anne commented 3 months ago

For a MVP, I was thinking of the following:

On this same page, also show a signed-in user the pending documents they've requested and allow them to cancel any outstanding request in order to make a new one.

One thing I'm not sure about is Courtlistener's scale. I honestly don't know how many documents are uploaded daily and if I'm thinking too small (maybe list 50 oldest documents instead?). What interested me about the weighting algorithm I proposed earlier is it would help to guarantee more useful documents are purchased earlier, but that less useful documents will bubble up to the top of the queue as the wait for them increases. Still, a MVP wouldn't need this weighting algorithm and that could come later.

mlissner commented 3 months ago

Yeah, you can do lots of fun stuff with the weighting (including letting people choose different sorting or filtering params), but the above sounds about right for an MVP.

A couple suggestions though:

We get a few thousand new docs per day, but there are about 100,000 that are filed in PACER each day, more or less.

Thanks for putting energy into this!

v-anne commented 3 months ago

Your suggestions are fine. I'm open to doing a beta test (e.g., for Free Law Project members) once the code is workable.

v-anne commented 3 months ago

@mlissner, I'm working on an outline at the moment and hope to have something for you soon. Is it feasible to change the schema you outlined above, or is it mostly set in stone?

mlissner commented 3 months ago

The DB schema? Yeah, we can change that. What are you thinking?

v-anne commented 2 months ago

I'll hold off on proposing schema changes for now. I'm still trying to navigate the codebase.

My machine has issues running cl locally so I unfortunately can't do much testing without making a PR.

Right now, I think a barebones MVP would have the following:

  1. allow signed-in users to make requests per their daily quota directly on individual dockets
  2. have a page that compiles requested documents and ranks them as desired, and removes requests after they are fulfilled

Filters and email notifications can come after. ACMS cases would have to be excluded for now as well.

I think this can be broken down further into some subtasks.

  1. Modify de_list.html to have the desired symbol for a document request in each row (perhaps the emoji as described above)
  2. Have a python function that takes requests from (1) and adds them to the Prayer model.
  3. Create a pending requests page.
  4. On the pending requests page, call another python function to aggregate the requests that are pending and then display them
  5. Create a function that updates the Prayer model each time a document is purchased and uploaded to RECAP.
v-anne commented 2 months ago

Here is potential code for the function described in step (2) above. One potential addition is checking that the document is not already in the database, but that might not be necessary if the prayer hands are only displayed next to documents that haven't been uploaded yet.


from datetime import timedelta
from django.utils import timezone
from django.contrib.auth.models import User
from cl.search.models import RECAPDocument
from cl.favorites.models import Prayer

def new_prayer(user: User, recap_document: RECAPDocument) -> Optional[Prayer]:

  now = timezone.now()
  last_24_hours = now - timedelta(hours=24)

  # Count the number of prayers made by this user in the last 24 hours
  prayer_count = Prayer.objects.filter(user=user, date_created__gte=last_24_hours).count()

  if prayer_count < 5:
      new_prayer = Prayer.objects.create(
          user=user,
          recap_document=recap_document,
          status=Prayer.WAITING
      )
      print(f"New prayer created for recap_document.")
  else:
      print("User has already exceded their quota in the last 24 hours. No new prayer created.")
v-anne commented 2 months ago

And here's code for (4). I went a bit overboard by using geometric mean as the ranking heuristic. Open to simplifying it.


from django.db.models import Count, Avg, ExpressionWrapper, F, FloatField
from django.utils import timezone
from django.db.models.functions import Now, Sqrt
from cl.search.models import RECAPDocument
from cl.favorites.models import Prayer

# Calculate the age of each prayer
prayer_age = ExpressionWrapper(Now() - F('prayers__date_created'), output_field=FloatField())

# Annotate each RECAPDocument with the number of prayers and the average prayer age
documents = RECAPDocument.objects.annotate(
    prayer_count=Count('prayers'),
    avg_prayer_age=Avg(prayer_age)
).annotate(
    # Calculate the geometric mean (sqrt(prayer_count * avg_prayer_age))
    geometric_mean=Sqrt(F('prayer_count') * F('avg_prayer_age'))
).order_by('-geometric_mean')[:50]
mlissner commented 2 months ago

Nice stuff, and a great direction. Definitely inspiring some thoughts on this project:

  1. How do users find out that they're out of prayers? One solution: All the emoji's go gray with a mouseover message. Another: When they click, they get an error message nearby. Seeking the easiest thing here...

  2. Your planned approach sounds great.

  3. Comments on your first code sample:

    • We don't use print statements in prod code. We can log it if you want, but it's probably not worth doing, so I'd suggest removing the print statements entirely.
    • In this line: if prayer_count < 5, I think I'd use a setting that can be overridden via an environment variable. ALLOWED_PRAYER_COUNT perhaps?
  4. Your code in the second one does indeed look a bit overboard, but whatever, it's great. I was thinking we'd want the page paginated, but I actually think it'd be fine just to say, that these are the 50 most wanted docs and leave it at that. Who wants to go see what number 51 is anyway, right?

This all looks good to me! If you want help with tests, let us know and we can figure out how to get your system working properly. Looking great!

I'd love some mockups in the next round, if you think it's not too soon for that?

v-anne commented 2 months ago

Modifying the first method to account for your feedback and also for the comment I added below:


from datetime import timedelta
from django.conf import settings
from django.utils import timezone
from django.contrib.auth.models import User
from cl.search.models import RECAPDocument
from cl.favorites.models import Prayer

def prayer_eligible(user: User) -> bool:

  ALLOWED_PRAYER_COUNT = getattr(settings, 'ALLOWED_PRAYER_COUNT', 5)

  now = timezone.now()
  last_24_hours = now - timedelta(hours=24)

  # Count the number of prayers made by this user in the last 24 hours
  prayer_count = Prayer.objects.filter(user=user, date_created__gte=last_24_hours).count()

  if prayer_count < ALLOWED_PRAYER_COUNT:
      return True
  return False

def new_prayer(user: User, recap_document: RECAPDocument) -> Optional[Prayer]:

  if prayer_eligible(User) and not(RECAPDocument.is_available):
      new_prayer = Prayer.objects.create(
          user=user,
          recap_document=recap_document,
          status=Prayer.WAITING
      )
      return new_prayer

  return None

Not too familiar with how Django deals with environment variables, so I had to look it up. Please correct me if what I have is wrong.

v-anne commented 2 months ago

How do users find out that they're out of prayers? One solution: All the emoji's go gray with a mouseover message. Another: When they click, they get an error message nearby. Seeking the easiest thing here...

This is a good point. I think your first suggestion might be the way to go. I broke up my first code block into two methods, prayer_eligible and new_prayer. This allows for the first method to be the condition that grays out the emojis if a user has exhausted their quota, and the second method is solely about creating a valid prayer.

This all looks good to me! If you want help with tests, let us know and we can figure out how to get your system working properly. Looking great!

I think this should be ready to test in the coming days. I did have a question for you about where in the codebase to stash the requirement that bought documents be updated as fulfilled. That seems like one line that could be dropped into an existing function that covers document addition to the repository more broadly. Also, thinking about this, I think I'm missing a condition in new_prayer to check that the document isn't already in the repository, but I'm not 100% sure how to check for that.

mlissner commented 2 months ago

Yep, code looks about right.

Not too familiar with how Django deals with environment variables, so I had to look it up. Please correct me if what I have is wrong.

We use django-environ for this. You can see lots of examples in the settings directory.

I think I'm missing a condition in new_prayer to check that the document isn't already in the repository

That's RECAPDocument.is_available.

where in the codebase to stash the requirement that bought documents be updated as fulfilled

Yeah. We do have a lot of ingestion code. @albertisfu can you suggest where we'd hook into our ingestion pipeline to mark a prayer as granted when a new document is added to the system?

v-anne commented 2 months ago

Wherever Alberto wants it to go, I think the line should probably be this:


from cl.search.models import RECAPDocument
from cl.favorites.models import Prayer

Prayer.objects.filter(recap_document=recap_document, status=Prayer.WAITING).update(status=Prayer.GRANTED)
albertisfu commented 2 months ago

Sure. It seems we don't have a centralized ingestion method where we could hook this code. We receive PDFs from multiple sources, such as RECAP uploads, the Fetch API, and the Free Documents Scraper. If we want to consider all possible PDF sources for this feature, I think the best place to add it is within the RECAPDocument post-save signal that we currently use to process Citations once the RECAPDocument PDF is extracted.

https://github.com/freelawproject/courtlistener/blob/f70b55abd380a67942b0c8394d4faa5a81212ef8/cl/search/signals.py#L550

In this case, we can check some additional conditions to ensure that the Prayer is set to "granted" correctly. For instance, we can check if the is_available field has changed and is now set to True.

We can do that using the fields tracker: https://django-model-utils.readthedocs.io/en/latest/utilities.html#tracking-specific-fields

Currently, we're tracking is_available in RECAPDocuments for Elasticsearch indexing, so this field is already available in the tracked fields.

v-anne commented 2 months ago

@albertisfu, how about this in the function you specified:


from cl.search.models import RECAPDocument
from cl.favorites.models import Prayer

if "is_available" in instance.es_rd_field_tracker.changed():
  Prayer.objects.filter(recap_document=instance, status=Prayer.WAITING).update(status=Prayer.GRANTED)
albertisfu commented 2 months ago

yeah, Iā€™d just recommend a few changes. We can check for a single field that changed in this case, is_available and also confirm that its value is now True

if instance.es_rd_field_tracker.has_changed("is_available") and instance.is_available == True:
  Prayer.objects.filter(recap_document=instance, status=Prayer.WAITING).update(status=Prayer.GRANTED)
v-anne commented 2 months ago

Thanks, I'll incorporate your changes, @albertisfu.

@mlissner, where in the codebase would you suggest putting tests? I think the backend logic is pretty much done, now I need to work on the front end.

mlissner commented 2 months ago

If you're game for it, let's land the backend functionality in one PR and the front in a second. Why not? it should help to have smaller PRs we can take a little at a time.

Tests should go in cl.favorites.tests.py

Thank you!

mlissner commented 2 months ago

I think the next step is the email, so I made an issue for it: https://github.com/freelawproject/courtlistener/issues/4486

mlissner commented 1 month ago

The front end of this is over in: https://github.com/freelawproject/courtlistener/issues/4507