freelawproject / courtlistener

A fully-searchable and accessible archive of court data including growing repositories of opinions, oral arguments, judges, judicial financial records, and federal filings.
https://www.courtlistener.com
Other
550 stars 151 forks source link

feat(scrapers.admin): create materialized view and admin page #4662

Open grossir opened 2 weeks ago

grossir commented 2 weeks ago

Concept for https://github.com/freelawproject/courtlistener/issues/3950

For this to work, the materialized view must be created directly on the DB by running the "query" attribute

For my local database, it looks like this image

But, if if connected to the live DB, it would have these rows: image

mlissner commented 2 weeks ago

We haven't used MV's before, so I have questions:

  1. I think I asked before, but what do we get from this that a query doesn't get us? I.e., why bother materializing it?

  2. How do we run the periodic refresh?

grossir commented 2 weeks ago

To do this kind of queries we have 3 options

In the case of this query, I don't think a new index would help. I think the slowness comes from joining big tables. Some query optimization may help reduce it more. The raw version takes 10 minutes, prefiltering the docket table by courts that have scrapers takes it down to 3 minutes

Why bother materializing it?

How to refresh it I expect to check this table a couple times a week. So we could refresh this every 2 or 3 days. We could use a plain cron job, the command is simple REFRESH MATERIALIZED VIEW <name>

Or, we could implement the refresh logic as an async call when accessing the admin page, and send a django admin message (those banners when editing an object) to the user advising to refresh the page in a few minutes

mlissner commented 2 weeks ago

Hm, in the past we had a script that would send an email with this kind of information, and it ran on a cron job, so I guess this is basically the same thing, but instead of storing the results of the script in an email, this stores the results in the materialized view. Seems fine, I suppose. :)