freelawproject / recap

This repository is for filing issues on any RECAP-related effort.
https://free.law/recap/
12 stars 4 forks source link

RECAP fails to load documents for Northern District of Georgia (content_delegate.js) #277

Closed stevenxdavis closed 4 years ago

stevenxdavis commented 5 years ago

This problem occurs with RECAP 1.2.11 in both Firefox 68.0.1 and Chrome 75.0.3770.142 for Windows 10. Checking the "Temporarily disable RECAP uploading even when logged in to PACER" box does not circumvent the problem; I have to completely disable RECAP to download the documents.

When I click "View Document" for a file from the Northern District of Georgia, RECAP prevents the document from showing up. It looks like GAND differs from other districts because it sends the user to a temporary page, which then runs a single line of JavaScript that redirects to the browser to the actual document.

For instance, when I navigate to "https://ecf.gand.uscourts.gov/doc1/055011805420?caseid=240678" and click "View Document," it redirects to a temporary page (see brokenpage.html.txt) that contains this snippet of javascript:

window.location = "/cgi-bin/show_temp.pl?file=10155833-0--728.pdf&type=application/pdf";

If I manually navigate the browser to https://ecf.gand.uscourts.gov/cgi-bin/show_temp.pl?file=10155833-0--728.pdf&type=application/pdf , I will then be able to view the document.

As far as I can tell, the problem is in this part of content_delegate.js:

  httpRequest(form.action, data, function (type, ab, xhr) {
    console.info('RECAP: Successfully submitted RECAP "View" button form: ' +
         xhr.statusText);
    var blob = new Blob([new Uint8Array(ab)], {type: type});
    // If we got a PDF, we wrap it in a simple HTML page.  This lets us treat
    // both cases uniformly: either way we have an HTML page with an <iframe>
    // in it, which is handled by showPdfPage.
    if (type === 'application/pdf') {
      // canb and ca9 return PDFs and trigger this code path.
      var html = '<style>body { margin: 0; } iframe { border: none; }' +
                 '</style><iframe src="' + URL.createObjectURL(blob) +
                 '" width="100%" height="100%"></iframe>';
      this.showPdfPage(document.documentElement, html, previousPageHtml,
        document_number, attachment_number, docket_number);
    } else {
      // dcd (and presumably others) trigger this code path.
      var reader = new FileReader();
      reader.onload = function() {
          this.showPdfPage(
            document.documentElement, reader.result, previousPageHtml,
            document_number, attachment_number, docket_number);
      }.bind(this);
      reader.readAsText(blob);  // convert blob to HTML text
    }

The console will display RECAP: Successfully submitted RECAP "View" button form: OK so it's at least executing a part of the function.

Hopefully this provides enough information to replicate the problem. The last time I worked with JavaScript was when I was a college freshman in 2006, so I can't provide much help to fix this, but I can at least describe it as best I can.

johnhawkinson commented 5 years ago

Huh. This is similar to what happens with free looks; they also use the window.location JS mechanism.

Although I don't think there's any reason to think that's actually going on here, both because Steven didn't mention that, and I seem to see a similar problem following the https://ecf.gand.uscourts.gov/doc1/055011805420?caseid=240678 link.

stevenxdavis commented 5 years ago

Huh. This is similar to what happens with free looks; they also use the window.location JS mechanism.

Although I don't think there's any reason to think that's actually going on here, both because Steven didn't mention that, and I seem to see a similar problem following the https://ecf.gand.uscourts.gov/doc1/055011805420?caseid=240678 link.

Since I'm hopelessly inept at web development, you should definitely take my evaluation with a grain of salt. If you can at least replicate the problem, I've probably done as much as I can do. Thanks.

mlissner commented 5 years ago

Yeah, I'm seeing it too. Hopefully this won't become a trend, but it looks like we'll have to fix this in order to keep GAND working.

I did just give them a call to ask what's up with this. I was able to get through to the IT department, who forwarded me to their CM/ECF admin, but he wasn't in so I left a VM. Hilariously, the first guy wouldn't tell me the admin's name or his own — it's apparently a big secret — but it's on his VM. Anyway, the guy to talk to a GAND is Daryl at 404-215-1655 (the docketing phone number). Presumably he has a direct line too, but that's a secret.

knowtheory commented 5 years ago

Okay! I hadn't realized how invasive (regarding page HTML) the extension is!

The Problem

So the proximate problem is that the ContentDelegate, in showPdfPage, tries to find an iframe, and when it doesn't find one, it just sticks whatever it had back in place, and gives up.

The root cause is more complicated in detail, but the tl;dr boils down to the fact that the extension doesn't know what to do with an interstitial page.

So. The details:

The extension works by intercepting the View Document button (by monkey patching all forms! 😱), and when the View Document button is clicked, it instead makes an ajax request for the form target.

Based on what comes back from that AJAX request (HTML w/ an iframed PDF, or a PDF), the extension pretends that it did the thing the user expects by clicking the View Document button, and displays an iframe in a pdf (this is what showPdfPage is supposed to do).

The problem of course, is that if the AJAX request doesn't give back a PDF-ish object, the extension doesn't know what to do.

Recommendations

The View Document handler needs to be smarter, and the showPdfPage needs to be changed architecturally.

The most minimal change that can be made is turning the onDocumentViewSubmit handler into a more extensible scraper (in the event that further sites follow suit). Ideally it'd be able to ask for the document view page, and take care of whatever it takes to get to a PDF.

The PDF detection should probably extracted up & out of the method simply named "showPdfPage" ;)

mlissner commented 5 years ago

I've read this twice to make sure we're entirely on the same page, and yes, this sounds like my understanding too. My gut is to do the minimal change needed to get this out quickly, but if you prefer to do the larger architectural change to feel better about this (or simply to get more familiar with the code), I welcome that too.

Great diagnosis.

lkj19zpdiDDe commented 4 years ago

I've created a very basic patch for this one.

PR is https://github.com/freelawproject/recap-chrome/pull/73

mlissner commented 4 years ago

Thank you all for the effort on this one. I merged it, upgraded all dependencies from @jraller's cleanup branch, and then did our integration testing routine on it. Everything passes.

Thank you all!