Open mlissner opened 4 years ago
I think putting RECAP in the name of the project is probably wise for branding; changing title accordingly.
The alerts on this will be a pleasure: "Your prayer has been granted: Doc XYZ is now available."
This data model should land later today:
class Prayer(models.Model):
WAITING = 1
GRANTED = 2
STATUSES = (
(GRANTED, "Prayer has been granted."),
(WAITING, "Still waiting for the document."),
)
date_created = models.DateTimeField(
help_text="The time when this item was created",
auto_now_add=True,
db_index=True,
)
user = models.ForeignKey(
User,
help_text="The user that made the prayer",
related_name="prayers",
on_delete=models.CASCADE,
)
status = models.SmallIntegerField(
help_text="Whether the prayer has been granted or is still waiting.",
choices=STATUSES,
default=WAITING,
)
# Generic b/c we may want to pray for items in other jurisdictions someday
content_type = models.ForeignKey(ContentType, on_delete=models.CASCADE)
object_id = models.PositiveIntegerField()
content_object = GenericForeignKey()
class Meta:
index_together = (
# When adding a new document, we'll ask: What outstanding prayers
# do we have for item Foo of type XYZ?
# When loading the prayer leader board, we'll ask: Which items of
# type XYZ have the most outstanding prayers?
("content_object", "object_id", "status"),
# When loading docket pages, we'll ask (hundreds of times): Did
# user ABC vote on item Foo of type XYZ?
("content_object", "object_id", "user"),
# When a user votes, we'll ask: How many outstanding prayers did
# user ABC make today?
("date_created", "user", "status"),
)
We can tweak it later, but I'm hopeful I can get it right the first time. Comments welcome.
The database part of this is now in place. Honestly, that's the hard part. The rest should be pretty easy, I think!
lol, love the idea, and the name, well done
What else?
It would be nice if the RECAP extension kept an estimate of the future PACER bill. Sometimes I download too much and go over my budget, because PACER provides correct numbers only after several days. If you advertise this project to individuals, many will join hoping to stay under the 30 $ per quarter and a portion of these will end up having bad surprises.
I once had an idea for a service that monitored PACER usage by using your credentials. Still would be a fun one to build. I think doing it via the extension would be hard, but it's not a bad idea. Feel free to open an issue over in freelawproject/recap, if you like. :)
The end of the quarter is nigh, and it ocurred to me that I once again would love a "wishlist" I could put my spare credit towards. Sounds like this is that feature.
It is...we need to build the darned thing!
Has anything come of this? I had the same idea recently and think it could get a lot of use.
It's partly done, but nobody is working on it, unfortunately.
@mlissner what remains to be done? Is it the interfacing with the Chrome extension?
I hadn't envisioned it being part of the extension, at least not at first. I think the missing piece is adding it to the front end. The models are in place, so we just need to use HTMX to add prayer hands and a leaderboard. Probably better to do the leader board first, and then add prayer hands once it's working nicely, that way we can test it a bit before we add prayer hands to 380 million links on the site!
@mlissner I'm willing to tackle this. I do have a suggestion though. It might be beneficial to add some sort of weighting system that prioritizes substantive documents (such as motions and discovery evidence) over less valuable documents like notices of appearances. Does courtlistener currently have any way of identifying what type of document a filing might be?
The other suggestion I have is to cap how many "prayers" any user can have open at a given time. This might not be feasible if people don't have to create courtlistener accounts to make requests.
Awesome! This should be a fun one.
Does courtlistener currently have any way of identifying what type of document a filing might be?
No, not yet, but it's the kind of thing we'd like to do fairly soon. It's on our AI/ML backlog (which is mostly in my head).
The other suggestion I have is to cap how many "prayers" any user can have open at a given time.
I think capping the prayers is essential. Above I suggested 20/day. In hindsight, maybe three or five is better. If you do fewer, people will use them more judiciously. I suppose it should be configurable though, in any case. Probably a good thing to put in an env?
Oops, I had more to say, sorry. Let me know if you want guidance on getting started. It'd be helpful, I think, to sketch out your general approach before beginning, I think. For example, maybe you could detail the views and URLs you expect to build and where you expect to build them, and we can provide guidance on that before you begin?
The right way to do this is probably by API, probably using django rest framework, so we'll need APIs for adding, removing, and listing prayers, I think, at a minimum? Want to put together some thoughts on this before you get into the details?
For a MVP, I was thinking of the following:
On this same page, also show a signed-in user the pending documents they've requested and allow them to cancel any outstanding request in order to make a new one.
One thing I'm not sure about is Courtlistener's scale. I honestly don't know how many documents are uploaded daily and if I'm thinking too small (maybe list 50 oldest documents instead?). What interested me about the weighting algorithm I proposed earlier is it would help to guarantee more useful documents are purchased earlier, but that less useful documents will bubble up to the top of the queue as the wait for them increases. Still, a MVP wouldn't need this weighting algorithm and that could come later.
Yeah, you can do lots of fun stuff with the weighting (including letting people choose different sorting or filtering params), but the above sounds about right for an MVP.
A couple suggestions though:
We get a few thousand new docs per day, but there are about 100,000 that are filed in PACER each day, more or less.
Thanks for putting energy into this!
Your suggestions are fine. I'm open to doing a beta test (e.g., for Free Law Project members) once the code is workable.
@mlissner, I'm working on an outline at the moment and hope to have something for you soon. Is it feasible to change the schema you outlined above, or is it mostly set in stone?
The DB schema? Yeah, we can change that. What are you thinking?
I'll hold off on proposing schema changes for now. I'm still trying to navigate the codebase.
My machine has issues running cl locally so I unfortunately can't do much testing without making a PR.
Right now, I think a barebones MVP would have the following:
Filters and email notifications can come after. ACMS cases would have to be excluded for now as well.
I think this can be broken down further into some subtasks.
de_list.html
to have the desired symbol for a document request in each row (perhaps the emoji as described above)Here is potential code for the function described in step (2) above. One potential addition is checking that the document is not already in the database, but that might not be necessary if the prayer hands are only displayed next to documents that haven't been uploaded yet.
from datetime import timedelta
from django.utils import timezone
from django.contrib.auth.models import User
from cl.search.models import RECAPDocument
from cl.favorites.models import Prayer
def new_prayer(user: User, recap_document: RECAPDocument) -> Optional[Prayer]:
now = timezone.now()
last_24_hours = now - timedelta(hours=24)
# Count the number of prayers made by this user in the last 24 hours
prayer_count = Prayer.objects.filter(user=user, date_created__gte=last_24_hours).count()
if prayer_count < 5:
new_prayer = Prayer.objects.create(
user=user,
recap_document=recap_document,
status=Prayer.WAITING
)
print(f"New prayer created for recap_document.")
else:
print("User has already exceded their quota in the last 24 hours. No new prayer created.")
And here's code for (4). I went a bit overboard by using geometric mean as the ranking heuristic. Open to simplifying it.
from django.db.models import Count, Avg, ExpressionWrapper, F, FloatField
from django.utils import timezone
from django.db.models.functions import Now, Sqrt
from cl.search.models import RECAPDocument
from cl.favorites.models import Prayer
# Calculate the age of each prayer
prayer_age = ExpressionWrapper(Now() - F('prayers__date_created'), output_field=FloatField())
# Annotate each RECAPDocument with the number of prayers and the average prayer age
documents = RECAPDocument.objects.annotate(
prayer_count=Count('prayers'),
avg_prayer_age=Avg(prayer_age)
).annotate(
# Calculate the geometric mean (sqrt(prayer_count * avg_prayer_age))
geometric_mean=Sqrt(F('prayer_count') * F('avg_prayer_age'))
).order_by('-geometric_mean')[:50]
Nice stuff, and a great direction. Definitely inspiring some thoughts on this project:
How do users find out that they're out of prayers? One solution: All the emoji's go gray with a mouseover message. Another: When they click, they get an error message nearby. Seeking the easiest thing here...
Your planned approach sounds great.
Comments on your first code sample:
if prayer_count < 5
, I think I'd use a setting that can be overridden via an environment variable. ALLOWED_PRAYER_COUNT
perhaps?Your code in the second one does indeed look a bit overboard, but whatever, it's great. I was thinking we'd want the page paginated, but I actually think it'd be fine just to say, that these are the 50 most wanted docs and leave it at that. Who wants to go see what number 51 is anyway, right?
This all looks good to me! If you want help with tests, let us know and we can figure out how to get your system working properly. Looking great!
I'd love some mockups in the next round, if you think it's not too soon for that?
Modifying the first method to account for your feedback and also for the comment I added below:
from datetime import timedelta
from django.conf import settings
from django.utils import timezone
from django.contrib.auth.models import User
from cl.search.models import RECAPDocument
from cl.favorites.models import Prayer
def prayer_eligible(user: User) -> bool:
ALLOWED_PRAYER_COUNT = getattr(settings, 'ALLOWED_PRAYER_COUNT', 5)
now = timezone.now()
last_24_hours = now - timedelta(hours=24)
# Count the number of prayers made by this user in the last 24 hours
prayer_count = Prayer.objects.filter(user=user, date_created__gte=last_24_hours).count()
if prayer_count < ALLOWED_PRAYER_COUNT:
return True
return False
def new_prayer(user: User, recap_document: RECAPDocument) -> Optional[Prayer]:
if prayer_eligible(User) and not(RECAPDocument.is_available):
new_prayer = Prayer.objects.create(
user=user,
recap_document=recap_document,
status=Prayer.WAITING
)
return new_prayer
return None
Not too familiar with how Django deals with environment variables, so I had to look it up. Please correct me if what I have is wrong.
How do users find out that they're out of prayers? One solution: All the emoji's go gray with a mouseover message. Another: When they click, they get an error message nearby. Seeking the easiest thing here...
This is a good point. I think your first suggestion might be the way to go. I broke up my first code block into two methods, prayer_eligible
and new_prayer
. This allows for the first method to be the condition that grays out the emojis if a user has exhausted their quota, and the second method is solely about creating a valid prayer.
This all looks good to me! If you want help with tests, let us know and we can figure out how to get your system working properly. Looking great!
I think this should be ready to test in the coming days. I did have a question for you about where in the codebase to stash the requirement that bought documents be updated as fulfilled. That seems like one line that could be dropped into an existing function that covers document addition to the repository more broadly. Also, thinking about this, I think I'm missing a condition in new_prayer
to check that the document isn't already in the repository, but I'm not 100% sure how to check for that.
Yep, code looks about right.
Not too familiar with how Django deals with environment variables, so I had to look it up. Please correct me if what I have is wrong.
We use django-environ for this. You can see lots of examples in the settings directory.
I think I'm missing a condition in new_prayer to check that the document isn't already in the repository
That's RECAPDocument.is_available
.
where in the codebase to stash the requirement that bought documents be updated as fulfilled
Yeah. We do have a lot of ingestion code. @albertisfu can you suggest where we'd hook into our ingestion pipeline to mark a prayer as granted when a new document is added to the system?
Wherever Alberto wants it to go, I think the line should probably be this:
from cl.search.models import RECAPDocument
from cl.favorites.models import Prayer
Prayer.objects.filter(recap_document=recap_document, status=Prayer.WAITING).update(status=Prayer.GRANTED)
Sure. It seems we don't have a centralized ingestion method where we could hook this code. We receive PDFs from multiple sources, such as RECAP uploads, the Fetch API, and the Free Documents Scraper. If we want to consider all possible PDF sources for this feature, I think the best place to add it is within the RECAPDocument post-save signal that we currently use to process Citations once the RECAPDocument PDF is extracted.
In this case, we can check some additional conditions to ensure that the Prayer is set to "granted" correctly. For instance, we can check if the is_available
field has changed and is now set to True
.
We can do that using the fields tracker: https://django-model-utils.readthedocs.io/en/latest/utilities.html#tracking-specific-fields
Currently, we're tracking is_available
in RECAPDocuments for Elasticsearch indexing, so this field is already available in the tracked fields.
@albertisfu, how about this in the function you specified:
from cl.search.models import RECAPDocument
from cl.favorites.models import Prayer
if "is_available" in instance.es_rd_field_tracker.changed():
Prayer.objects.filter(recap_document=instance, status=Prayer.WAITING).update(status=Prayer.GRANTED)
yeah, Iād just recommend a few changes. We can check for a single field that changed in this case, is_available
and also confirm that its value is now True
if instance.es_rd_field_tracker.has_changed("is_available") and instance.is_available == True:
Prayer.objects.filter(recap_document=instance, status=Prayer.WAITING).update(status=Prayer.GRANTED)
Thanks, I'll incorporate your changes, @albertisfu.
@mlissner, where in the codebase would you suggest putting tests? I think the backend logic is pretty much done, now I need to work on the front end.
If you're game for it, let's land the backend functionality in one PR and the front in a second. Why not? it should help to have smaller PRs we can take a little at a time.
Tests should go in cl.favorites.tests.py
Thank you!
I think the next step is the email, so I made an issue for it: https://github.com/freelawproject/courtlistener/issues/4486
The front end of this is over in: https://github.com/freelawproject/courtlistener/issues/4507
The idea here is to accomplish a few interesting things:
Create a public list somewhere of documents people want downloaded from PACER
Allow people to say they want something bought on PACER (and email them when we get it)
Demonstrate how much people want documents
Doing this should be pretty easy:
Create a new page explaining this initiative. It'll be called the PACER Pray and Pay Project ā "You pray for a document, they pay for it." I'm somewhat anti-religious, so I don't like religion entering non-religious places, but I think I can get behind this anyway. People "pray" for relief, right?
On every document, we add a little prayer hands emoji (š) next to the "Buy on PACER" button. When people click the button, they indicate that they want the doc, and it gets a vote. The number of people voting for a document can be displayed before/after somebody clicks for it.
Once somebody prays for a doc, they get signed up to get an email once it is in the system.
We have a page that shows the most wanted documents in real time. Maybe it's filterable by jurisdiction? It shows the number of votes for a document, the description of the doc, etc.
We could go figure out the price of something (by using the attachment page) whenever somebody votes for it.
People only get so many prayers per day, so they don't just vote like crazy people. 20/day as a starting point? As their prayers are granted, they get more. So if they pray for 20 items, and they're all downloaded, great, give them more.
What else?