freelawproject / recap

This repository is for filing issues on any RECAP-related effort.
https://free.law/recap/
12 stars 4 forks source link

RECAP extension should send `iquery.pl` docket summary pages #251

Closed johnhawkinson closed 1 year ago

johnhawkinson commented 6 years ago

(which are free) And juriscraper should parse them.

And I argue that it should not have a special UPLOAD_TYPE for them, because there are zillions of pages that should probably be sent and it's silly to have the extension have to figure out what they are (pre-parse them), even if it has a ruleset based on URLs. Just send them to the server which is going to do the parsing anyhow, regardless.

Although it's probably wise to send the URL, possibly with some filtering.

johnhawkinson commented 6 years ago

These are pages that include case title, judge, and last update date. Generally with judge initials in district courts (mad):

<center>
 <b><font size="+1">1:18-cv-10225-MLW</font></b> Calderon Jimenez v. Cronen et al<br>
 Mark L. Wolf, presiding<br>
 <b>Date filed:</b> 02/05/2018<br>
 <b>Date of last filing:</b> 06/04/2018<br>
</center>

Sometimes (mab) lacking that in BK land, but with other goodies:

<center>
 <b><font size="+1">18-10943</font></b><b></b>Lynnel M. Cox                                     <br>

 <b>Case type:</b> bk
 <b>Chapter:</b> 13
 <b>Asset:</b> Yes
 <b>Vol: </b> v
 <b>Judge:</b> Joan N. Feeney <br>

 <b>Date filed:</b> 03/19/2018
 <b>Date of last filing:</b> 06/04/2018
 <br>
</center>

But sometimes with initials for BK too (e.g. nysb):


<center>
  <b><font size="+1">18-10943-smb</font></b>
  <b></b>Martha L. Osorio                                  <br>

  <b>Case type:</b> bk
  <b>Chapter:</b> 7
  <b>Asset:</b> No
  <b>Vol: </b> v
  <b>Judge:</b> Stuart M. Bernstein <br>

  <b>Date filed:</b> 04/06/2018
  <b>Date of last filing:</b> 05/23/2018
  <br>
</center>````
mlissner commented 6 years ago

Thanks for this. Per discussion on Slack, filtering would be to cut out random nonces that PACER includes in URLs. I'm on the fence. OTOH, private info shouldn't be in GET params and they run the risk of those links being clicked. OTOH, who knows what abuse could be caused by sharing a PACER link?

johnhawkinson commented 6 years ago

Well, not necessarily only the apparent nonces (as in https://ecf.nysb.uscourts.gov/cgi-bin/iquery.pl?657620796569868-L_1_0-1). But also things like magic numbers: https://ecf.mad.uscourts.gov/doc1/09518715161?caseid=196119&de_seq_num=183&magic_num=47282941

(Now, the magic number is no longer valid after use, so it's not the worst thing in the world, but there's still no excuse to retain it. Although arguably we wouldn't be sending doc1 URLs anyhow).

OTOH, private info shouldn't be in GET params

I'm not sure what "shouldn't be" has anything to do with reality or what we care about.

mlissner commented 1 year ago

This seems worth doing as @ERosendo is working on new RECAP features. Onto the heap it goes, but only to do the iquery.pl pages, not the rest. The rest is worth doing, ideally, but we're not interested in that kind of overhaul.

@albertisfu, you'll have to do some backend work for this too, in two stages. First, just to accept these pages from the extension and store them. Later to actually parse them. You could do it in one step, if that's not too much trouble too, but we want to get the first step at least working so @ERosendo can get the extension part of this done.

mlissner commented 1 year ago

Just to do a bit of timing planning on this, @albertisfu, this is the next issue for @ERosendo on this backlog and he'll need your help. He can jump ahead to the next thing if you're busy, but please find time this week to work on this together however is best for you guys.

albertisfu commented 1 year ago

Sure! I've already met with @ERosendo and we talked about this issue. We agree that the API request for iquery.pl pages might look like the ones for Dockets. But we might need to add a new upload_type in order to differentiate them.

Maybe it might be called: IQUERY_PAGE = 12

@mlissner does this new upload_type seem good to you?

So we will accept this new upload type and store it. Then in the next step, I will add support to Juriscraper to parse them and update Dockets using iquery page data.

mlissner commented 1 year ago

Sounds great. There's already a parser and merging code in juriscraper and courtlistener too.

albertisfu commented 1 year ago

Perfect! I'll check them. Thanks!

johnhawkinson commented 1 year ago

We agree that the API request for iquery.pl pages might look like the ones for Dockets. But we might need to add a new upload_type in order to differentiate them.

I have suggested in the past that we should move away from this. There are a lot of pages that the extension and the server don't parse ,and requiring upload types for them gets in the way of incremental progress.

I think we should just upload all the pages the extension is going to upload and let the server sort them out, without requiring new upload types, going forward.

mlissner commented 1 year ago

Someday, maybe, I don't think it really affects progress much though.

mlissner commented 1 year ago

Nice teamwork on this one, everybody. Now onwards to the main event.