freelawproject / recap

This repository is for filing issues on any RECAP-related effort.
https://free.law/recap/
12 stars 4 forks source link

Appellate file renaming not working in ca3? #43

Closed freelawbot closed 6 years ago

freelawbot commented 9 years ago

Issue by johnhawkinson Tuesday Aug 06, 2013 at 16:23 GMT Originally opened as https://github.com/freelawproject/recap-server/issues/38


In USA v. Auernheimer (weev) 13-1816 at ca3, i.e. http://ia601700.us.archive.org/17/items/gov.uscourts.ca3.13-1816/gov.uscourts.ca3.13-1816.docket.html, I just tried to download documnet 003011347514 0 ECF FILER: Response filed by Appellant Andrew Auernheimer to Motion to Accept Noncompliant filing, Motion stay request. Certificate of Service dated 08/05/2013. (HMF) from yesterday.

I ended up with ca3-Tra0sportRoom?servlet=ShowDoc&dls_id=003011347514&caseId=87236&dktType=dktPublic.pdf in my filesystem, and I don't know if anything was successfully uploaded to archive.org (looks like not), though docket metadata made it (unsurprisingly). debugging output was not fatal-looking:

8/6/13 12:18:07.546 PM [0x0-0x17e77e6].org.mozilla.firefox: RECAP: Url: /cmecf/servlet/TransportRoom?servlet=ShowDoc&dls_id=003011347514&caseId=87236&dktType=dktPublic 8/6/13 12:18:07.546 PM [0x0-0x17e77e6].org.mozilla.firefox: RECAP: Name: 003011347514.pdf 8/6/13 12:18:07.546 PM [0x0-0x17e77e6].org.mozilla.firefox: RECAP: List of stuff: ... 8/6/13 12:18:12.243 PM [0x0-0x17e77e6].org.mozilla.firefox: RECAP: Posting file: 003011347514.pdf 8/6/13 12:18:21.506 PM [0x0-0x17e77e6].org.mozilla.firefox: RECAP: RECAP File Upload - PDF uploaded to the public archive. 8/6/13 12:18:21.509 PM [0x0-0x17e77e6].org.mozilla.firefox: RECAP: [object Object]

freelawbot commented 9 years ago

Comment by johnhawkinson Thursday Sep 12, 2013 at 02:18 GMT


Same problem to day in Floyd v. City of New York (stop and frisk case) in ca2, 13-3088, docket 44.

9/11/13 9:55:24.676 PM [0x0-0x35035].org.mozilla.firefox: RECAP: Exception 9/11/13 9:55:24.676 PM [0x0-0x35035].org.mozilla.firefox: RECAP: After getting META 9/11/13 9:55:24.676 PM [0x0-0x35035].org.mozilla.firefox: RECAP: After name 9/11/13 9:55:24.676 PM [0x0-0x35035].org.mozilla.firefox: RECAP: Url: /cmecf/servlet/TransportRoom?servlet=ShowDoc&dls_id=00202736667&caseId=20140&dktType=dktPublic 9/11/13 9:55:24.676 PM [0x0-0x35035].org.mozilla.firefox: RECAP: Name: 00202736667.pdf 9/11/13 9:55:24.676 PM [0x0-0x35035].org.mozilla.firefox: RECAP: List of stuff: 9/11/13 9:55:24.676 PM [0x0-0x35035].org.mozilla.firefox: RECAP: {"mimetype":"application/pdf","court":"ca2","name":"00202736667.pdf","url":"/cmecf/servlet/TransportRoom?servlet=ShowDoc&dls_id=00202736667&caseId=20140&dktType=dktPublic"} ... 9/11/13 9:55:26.276 PM [0x0-0x35035].org.mozilla.firefox: RECAP: Posting file: 00202736667.pdf

and doesn't show up on archive.org. Though the docket.html updated. http://ia601004.us.archive.org/8/items/gov.uscourts.ca2.13-3088/

freelawbot commented 9 years ago

Comment by johnhawkinson Tuesday Oct 01, 2013 at 15:50 GMT


Oh, I think part of the problem relates to docket entries with attachments. gov.uscourts.ca2.13-3088.114.0.pdf worked just fine, but then none of the 3 subdocs of 115 correctly renamed or uploaded.

mlissner commented 9 years ago

Issues #47 and #44 are both duplicates of this one, so I'll be closing them shortly.

The problem as I currently understand it is:

  1. The extensions get None as a case number.
  2. The extensions get numbers with hyphens as a case number, like 12-3232.
  3. The uploads_bucketlock table expects an int for the case number even though the Django model allows it to be a varchar.
  4. Where we get None, the backend explodes and says:

    Truncated incorrect DOUBLE value: 'None'

    Where we get the case number, it says:

    Truncated incorrect DOUBLE value: '15-5075'

    In both cases, this crashes the IA uploader, and, fun fact, no joke, no item with a case number that's alphabetically after these values will get uploaded.

So, this is all rather bad.

The solutions here are:

I believe this is the biggest issue with RECAP right now.

cc: @johnhawkinson, @carlmalamud

mlissner commented 9 years ago

@Johnhawkinson, you'll be pleased to learn that your file naming issue is resolved in freelawproject/recap-firefox@d50b6b3cb66ba1de7138ae23ec7898b79673f2cc

Remainder of this issue is still at large though. I shall press on.

mlissner commented 9 years ago

@harlanyu, @dkapadia, @sjschultze, I put a bunch more time into this today and I think I found the issue, but I have a question for you guys.

It appears that certain versions of PACER have changed their the POST data that is sent when you request an appellate docket sheet.

On old versions (like CA6), you'd have a GET request like:

https://ecf.ca6.uscourts.gov/cmecf/servlet/TransportRoom?
     servlet=CaseSummary.jsp&
     caseNum=15-1019&
     incOrigDkt=Y&
     incDktEntries=Y

And we could easily say that the case number was the value of caseNum. Great.

In the new version (like CA9), the GET request is:

 https://ecf.ca9.uscourts.gov/n/beam/servlet/TransportRoom

Not too helpful. But we collect the following from the POST request:

Content-Type: application/x-www-form-urlencoded
     Content-Length: 196
     servlet=CaseSummary.jsp&
       caseId=267130&
       fullDocketReport=Y&
       incOrigDkt=Y&
       incPrior=Y&
       incAssoc=Y&
       incPtyAty=Y&
       incCaption=long&
       incDktEntries=Y&
       dateFrom=&
       dateTo=&
       incPdfMulti=Y&
       actionType=Run+Docket+Report

There's a parameter in there for caseID, which we could start using for the case number, but I'm not sure if that's what we want to do since the value is clearly not the same as the docket number (which in this case is 15-80056).

Do you guys have insight?

johnhawkinson commented 9 years ago

I'll just say, ca9 has moved to CM/ECF NextGen, so it's not surprising things are different there.

Also, I've found it MUCH BETTER to have urls like

http://archive.org/download/gov.uscourts.ca2.13-3088/

than to have

http://archive.org/download/gov.uscourts.nysd.320470

especially because of two bugs:

(1) The RECAP server search engine is broken and you can't rely on it to search for a docket number and get back the archive.org URL. Instead you have to go to CM/ECF and do a query and run a docket report and check the URL for the [R] icon links.

(2) Oftentimes a single case will return multiple case id numbers and that means the docket report on archive.org is broken into two parts, with no way to figure out which is which. For instance:

http://archive.org/download/gov.uscourts.mad.160895 http://archive.org/download/gov.uscourts.mad.160894

I think there is another open issue about this problem. But it really makes the RECAP docket...less than optimally useful. If those were instead

http://archive.org/download/gov.uscourts.mad.1:14-cr-10143

it would be a much better experience.

Really using an internal identifier as user-facing is a data management mistake. It's super-hard to fix (flag day! compatability!) with district court RECAP, but please let's not "fix" the appellate CM/ECF to have the same problem.

carlmalamud commented 9 years ago

I vote for the official docket number.

I also like hyphens if the court likes hyphens.

brianwc commented 9 years ago

Well, I agree with the last two comments that, ideally, we'd use real docket numbers in our archive.org URLs because it has always been a pain to find the PACER case id as @johnhawkinson explained. However, what @mlissner seems to be telling us is that the post data, by itself, is NOT giving us the docket number, but it is giving us the caseid#. So, it'd be a good bit easier to use data they are providing than to hunt-and-peck around trying to find the data we'd prefer.

CL also puts "internal" docid #s into its urls and I've always hated it. Just ask Mike if I favor "predictable URLs" and he can show you a couple hundred pages of emails about the topic. But the problem with docket numbers as they are used by our federal courts is that only if we use the form "1:14-cr-10143" are they actually unique (and I wouldn't be surprised to learn of collisions even when adding the first number and the cr/cv/bk designations.) So, if the courts cannot be relied upon to use unique identifiers, then we cannot adopt their almost-unique-identifiers where a unique identifier is required. I think if we can find a reliable way to retrieve the FULL docket numbers, with preceding colon-separated digit and with letter codes, then I'd be willing to try to use those until we learn for sure that they aren't unique. I really really hope they're unique, but the courts have never failed to let me down. Also, I don't know whether we can reliably retrieve them. That'll be @mlissner's call.

mlissner commented 9 years ago

Not the breakthrough on this issue we're looking for, but I've converted the casenum field to a varchar so it conforms with the model.

mlissner commented 6 years ago

Closing this monster bug. Hopefully we'll get this resolved as a by-product of adding appellate court support in #83.