PDF viewing errors in IE (with byte range requests)

seanaery commented 8 years ago

In Internet Explorer (confirmed in IE10 & IE11), when clicking Download on a PDF item in the DDR and then scrolling the PDF before it has fully loaded in the browser, the document stops loading and displays an error.

There was an error processing a page. There was a problem reading this document (109).

Ctrl-clicking on "OK" displays: Object label badly formatted.

With the same actions, I have also received these errors:

There was an error processing a page. There was a problem reading this document (135).

There was an error processing a page. There was a problem reading this document (14).

dchandekstark commented 8 years ago

I believe the problem is most likely caused by projecthydra/hydra-head#335.

A workaround that seems to be effective is to fixup the response headers in Apache when the client is IE 10 or 11 to reject range requests:

BrowserMatch "MSIE 1[01]" ie_10_or_11
Header always edit Accept-Ranges bytes none env=ie_10_or_11

seanaery commented 8 years ago

Documenting some additional troubleshooting on the issue...

IE10 SUCCESSFUL IE PDF STREAM

If bypassing the repository, we can get a PDF in IE piecemeal (in byte ranges) from the filesystem successfully. There are two different kinds of byte range requests (and corresponding responses) at play. Most resemble this, a single byte range:

**REQUEST**
Request: GET /pdf/dcrst003604.pdf HTTP/1.1
Range:  bytes=1766912-1767689

**RESPONSE**
Response: HTTP/1.0 206 Partial Content
Accept-Ranges: bytes
Content-Range: bytes 1766912-1767689/1767690
Content-Type: application/pdf

But occasionally (e.g., 13/80 of the requests), the interaction resembles the following; it’s not just a single byte range requested/returned in one request/response, but multiple:

**REQUEST**
Request: GET /pdf/dcrst003604.pdf HTTP/1.1
Range: bytes=846336-846847, 846848-847359, 847360-847871, 847872-848383, 848384-848895, 848896-849407, 849408-849919, 849920-850431, 850432-850943, 850944-851455, 851456-851967, 851968-852479, 852480-852991, 852992-853503, 853504-854015, 854016-854527, 854528-855039, 855040-855551, 855552-856063, 856064-856575, 856576-857087, 857088-857599, 857600-858111, 858112-858623, 858624-859135, 859136-859647, 859648-860159, 860160-860671, 860672-861183, 861184-861695, 861696-862207, 862208-862719, 862720-863231, 863232-863743, 863744-864255, 864256-864767, 864768-865279, 865280-865791, 865792-866303, 866304-866815, 866816-867327, 867328-867839, 867840-868351

**RESPONSE**
Response: HTTP/1.0 206 Partial Content
Accept-Ranges: bytes
Content-Type: multipart/byteranges; boundary=5304cce3ce69a22

IE10 BROKEN PDF STREAM VIA HYDRA

Loading the file through the repository, there are the same two different kinds of byte range requests (some with a single range, some with multiple). The responses are incorrect for multi-part byte range requests, so getting the PDF piecemeal fails. Again, most resemble this, a single byte range:

**REQUEST**
Request: GET /download/duke:316943 HTTP/1.1
Range: bytes=2689536-2693631

**RESPONSE**
Response: HTTP/1.0 206 Partial Content
Accept-Ranges: bytes
Content-Range: bytes 2689536-2693631/2694205
Content-Type: application/pdf

But here’s where it’s problematic: when multiple byte ranges are requested in the same HTTP request:

**REQUEST**
Request: GET /download/duke:316943 HTTP/1.1
**Range: bytes=2693632-2694204, 1579520-1589247**

**RESPONSE**
Response: HTTP/1.0 206 Partial Content
Accept-Ranges: bytes
**Content-Range: bytes 2693632-2694204/2694205**
**Content-Type: application/pdf**

The response is incorrect; only the first range of bytes has been returned.

The problem appears to be a combination of 1) how the hydra-head gem parses range requests, and 2) that IE’s native PDF reader uses multipart range requests to begin with. We haven’t observed the problem in other browsers’ PDF readers; they likely all issue only single byte range requests.

duke-libraries / ddr-public

PDF viewing errors in IE (with byte range requests) #348

IE10 SUCCESSFUL IE PDF STREAM

IE10 BROKEN PDF STREAM VIA HYDRA