freelawproject / recap

This repository is for filing issues on any RECAP-related effort.
https://free.law/recap/
12 stars 4 forks source link

Re-evaluate our position on CMECF filer non-uploads and free looks? #232

Open johnhawkinson opened 6 years ago

johnhawkinson commented 6 years ago

This issue sparked by a recent Twitter thread, quoted below.

Other thoughts?


Twitter thread: https://twitter.com/stringquintet/status/954176606565797888

Nicholas Marritz @RECAPtheLaw as a lawyer, you get one free download of any doc in a case in which you've filed an appearance. Why don't those get uploaded to RECAP when you access them? 9:20 PM - 18 Jan 2018

John Hawkinson This is a tricky issue. 2 reasons: 1: Attys may have access to sealed docs or FRCP 5.2(c) -protected docs, & if RECAP parsed "free looks" it could expose private data 2: There is no receipt page (clickthru confirm) generated for "free looks" so the implimentation's different

Nicholas Marritz Both good reasons. Thanks.

John Hawkinson Maybe…not sure :) For #1: Is there an effective UI/UX that would allow this to work that would not be cumbersome & prone to habituated confirmation error? It's v. important for attys to be confident RECAP won't lead to inadvertent disclosure…

John Hawkinson In the old recap-firefox codebase, RECAP tried to disable itself if you were logged in w/ a CMECF filing account, not just for free looks. I think that code didn't actually &, and got dropped in the new codebase (so-called recap-chrome).

It may be worth more deliberate thought.

John Hawkinson OTOH, some courts have an interstitial "This document is restricted to counsel of record" page before allowing downloads. Maybe that's good enough? I thought I had a screenshot but can't find one & no longer have access to such cases (because I intervened & unrestricted them!).

Nicholas Marritz EDVA definitely has this interstitial screen for sealed cases; not sure about other courts.

johnhawkinson commented 6 years ago

Also, w/r/t free looks, the extension should certainly fix up the filenames. This kind of thing drives me up the wall:

-rw-r--r--@    1 jhawk  staff      240445 Jan 19 21:14 show_temp-2.pl
-rw-r--r--@    1 jhawk  staff       23556 Jan 19 21:14 show_temp-1.pl
-rw-r--r--@    1 jhawk  staff       22138 Jan 19 21:14 show_temp.pl

Not the least of which is FF tries to open them in Xcode for me...

johnhawkinson commented 6 years ago

I've made some progress on capturing free looks in the extension. For a variety of technical reasons it is tricker than it looks. It would be good to see some discussion here, though.

johnhawkinson commented 6 years ago

Incidently:

None of these are showstoppers, but they don't really encourage this. On the other hand, it's not like the masses are clamoring for this support in this issue.

mlissner commented 6 years ago

(Just for posterity, memorializing a few other notes.)

Unless the document has attachments, there is also no way for the extension to get the document number.

In theory, it's possible to sniff this from the headers of the document itself. The only problem is that some documents are refiled in many cases, which could result in there being multiple headers on the PDF or in the wrong headers being on the PDF. Still, this could be a useful thing to pull.

It also means the client doesn't have a great answer for naming the file, since both IA style and Lawyer style include the document number in the filename. Hence filenames like mad-09508249183.pdf from the recap-ff extension, I guess.

The good news is that we don't actually create those file names when somebody uploads something to RECAP. It actually happens later, when we figure out how to add it to a docket. Not sure that's a useful distinction, but I thought it worth mentioning.

johnhawkinson commented 6 years ago

Note that my comment was about the extension (client). While it's practical for the server to do PDF inspection (at least in theory), much less so for the client (though maybe not crazy, but see below).

In theory, it's possible to sniff this from the headers of the document itself. The only problem is that some documents are refiled in many cases, which could result in there being multiple headers on the PDF or in the wrong headers being on the PDF. Still, this could be a useful thing to pull.

Well, as long as there are headers (which are optional). Refiled in multiple cases, yes. In "many cases," I don't know. And I'm not sure multiple headers are a showstopper — you merely need to identify the most recent set. (Which seem to come send in my limited sample set.)

The good news is that we don't actually create those file names when somebody uploads something to RECAP.

Yes, just to be clear, my concern was the naming of the file by the client. That is, the PDF file that the RECAP user ends up with that they just ~paid for~ got a free look for. Of course, mad-09508249183.pdf is a lot better than show_temp-2.pl.


In re header inspection in the client, it's probably not too hard to find objects like this:

<<
/Length 131
>>
stream
 Q
q
BT
0 0 0 rg
/Xi2 12 Tf
1 0 0 1 119.33 767.36 Tm
(Case 1:17-cv-10938-IT   Document 85-4   Filed 03/19/18   Page 2 of 6)Tj
ET
Q

endstream 
endobj 
31 0 obj 

Unfortunately they're Flate-encoded so it's more like:

<<
/Length 73 0 R
/Filter /FlateDecode
>>
stream
h<DE><DC>QKJ^EA^L<DC><F7>)<B2>^V<CC>$<E9>?<C8>[<F8><C1>^C<98>^S8<E0><80><A0><9B>Yx}<93>^W<9F><8F><C1>ESCH<93><AE>tS<A9>
<C9><F2><FC>°<ED><E9>^Ӣ<CA><C0><A0>o<89>        <E7>^@<B2>^S٨~<F7>R<B1>^WЏD<B0>Y<E8><EA><D7>W<BA>#<A2>|<D2>w{<U+0776><82>Eh<80>>^<BF>YBJ.:Y<90><87>
<EB>\)<84>$^WC<FA>%
a><F0>́<91><CB><EC>^G^C<BD><F9><C3><C8>G<C6>^?<AF>ESCXc|<E7>$<A6>WsG)<87><F1>ن%6܃n<C0><D2>pBi^M<A5>9yy<D8>ESC<AC>; <E7>.<D5>@<B8>v<87>Z&<EC><EB>gBSc_<BE><C9><CA>0<E5>WC<8F>բ^F<E6><U+E596><F7><E8>yK^E<A7>ˑ<B7>M?<D5><D1>ԓ<A6>o^A^F^@<86><98><84><CD>
endstream
endobj
31 0 obj
mlissner commented 6 years ago

DEFLATE seems to be workable: https://stackoverflow.com/questions/2233062/javascript-deflate-implementation

But worth it? I doubt it.

johnhawkinson commented 6 years ago

See also https://github.com/freelawproject/courtlistener/issues/809 (Document uploads without document numbers fail).

sjschultze commented 6 years ago

Memorializing some discussion:

There are two possible categories of "protected" documents that we might need to worry about:

  1. FRCP 5.2(c) documents: @johnhawkinson has some preliminary extra-paranoid code to make it even less likely that these could be uploaded
  2. sealed documents: At least some districts (such as dcd and cand) allow parties to upload sealed documents. It is not clear if in any district anyone other than judges and some court staff can view these. See:

It would be nice to learn more about sealed documents. It also seems like it may be a good idea to turn @johnhawkinson's code into something production-ready, and of course to test it.

Absent evidence of a scenario in which "protected" documents would actually be uploaded, I don't favor re-introducing the ECF-login limitation.

(And I do favor trying to find a way to capture "free look" documents, although I don't know how high priority that is.)

mlissner commented 6 years ago

Absent evidence of a scenario in which "protected" documents would actually be uploaded, I don't favor re-introducing the ECF-login limitation.

This is referring to the cookie check, right?

(And I do favor trying to find a way to capture "free look" documents, although I don't know how high priority that is.)

This is #38. A few people have asked for it/mentioned it and @johnhawkinson has put some work into this too. I'd put it at medium priority based on user demand. FWIW, the things that are bugging people most right now are:

Probably in that order?

sjschultze commented 6 years ago

I don't know where Zip files and big files sit in the priority list relative to "free look" files, but appellate is definitely highest priority. I'm going to try to make some progress there. It is now much cheaper to test our progress on making appellate work given that we discovered that CAVC does not charge.