freelawproject / courtlistener

A fully-searchable and accessible archive of court data including growing repositories of opinions, oral arguments, judges, judicial financial records, and federal filings.
https://www.courtlistener.com
Other
525 stars 144 forks source link

Broken links to cases on court websites should be detected and removed #173

Open freelawbot opened 10 years ago

freelawbot commented 10 years ago

Over time, more and more of the links to cases on the court websites are going to break. When this happens, it can make CL look bad.

A proxy should be introduced that checks the case link, and then either returns the case directly from the court (if the link works), or gives the user an error message, and then the document from our local backup. The user thus gets the PDF no matter what, and is always aware of its source.

Whenever a link fails, a tally is made in the DB, and when the tallies get to be too many, the link is automatically removed. Determining how many tallies is too many should be some kind of rate. Several tallies in a couple minutes shouldn't count, but several tallies spread over a couple weeks or months should.


freelawbot commented 10 years ago

Looks like this might be the algorithm to use for detecting this:

http://www.evanmiller.org/how-not-to-sort-by-average-rating.html

It's called the "Lower bound of Wilson score confidence interval for a Bernoulli parameter" and it's used on Digg, Reddit, and Yelp.

May need some math help for this.


Original Comment By: Mike Lissner