duke-libraries / ddr-public

Public interface for Duke Digital Repository
https://repository.duke.edu
0 stars 0 forks source link

PDF viewing errors in IE (with byte range requests) #348

Open seanaery opened 8 years ago

seanaery commented 8 years ago

In Internet Explorer (confirmed in IE10 & IE11), when clicking Download on a PDF item in the DDR and then scrolling the PDF before it has fully loaded in the browser, the document stops loading and displays an error.

screen shot 2016-04-12 at 10 46 01 am
There was an error processing a page. There was a problem reading this document (109).

Ctrl-clicking on "OK" displays: Object label badly formatted.

With the same actions, I have also received these errors:

There was an error processing a page. There was a problem reading this document (135).
There was an error processing a page. There was a problem reading this document (14).
dchandekstark commented 8 years ago

I believe the problem is most likely caused by projecthydra/hydra-head#335.

A workaround that seems to be effective is to fixup the response headers in Apache when the client is IE 10 or 11 to reject range requests:

BrowserMatch "MSIE 1[01]" ie_10_or_11
Header always edit Accept-Ranges bytes none env=ie_10_or_11
seanaery commented 8 years ago

Documenting some additional troubleshooting on the issue...

IE10 SUCCESSFUL IE PDF STREAM

If bypassing the repository, we can get a PDF in IE piecemeal (in byte ranges) from the filesystem successfully. There are two different kinds of byte range requests (and corresponding responses) at play. Most resemble this, a single byte range:

**REQUEST**
Request: GET /pdf/dcrst003604.pdf HTTP/1.1
Range:  bytes=1766912-1767689

**RESPONSE**
Response: HTTP/1.0 206 Partial Content
Accept-Ranges: bytes
Content-Range: bytes 1766912-1767689/1767690
Content-Type: application/pdf

But occasionally (e.g., 13/80 of the requests), the interaction resembles the following; it’s not just a single byte range requested/returned in one request/response, but multiple:

**REQUEST**
Request: GET /pdf/dcrst003604.pdf HTTP/1.1
Range: bytes=846336-846847, 846848-847359, 847360-847871, 847872-848383, 848384-848895, 848896-849407, 849408-849919, 849920-850431, 850432-850943, 850944-851455, 851456-851967, 851968-852479, 852480-852991, 852992-853503, 853504-854015, 854016-854527, 854528-855039, 855040-855551, 855552-856063, 856064-856575, 856576-857087, 857088-857599, 857600-858111, 858112-858623, 858624-859135, 859136-859647, 859648-860159, 860160-860671, 860672-861183, 861184-861695, 861696-862207, 862208-862719, 862720-863231, 863232-863743, 863744-864255, 864256-864767, 864768-865279, 865280-865791, 865792-866303, 866304-866815, 866816-867327, 867328-867839, 867840-868351

**RESPONSE**
Response: HTTP/1.0 206 Partial Content
Accept-Ranges: bytes
Content-Type: multipart/byteranges; boundary=5304cce3ce69a22

IE10 BROKEN PDF STREAM VIA HYDRA

Loading the file through the repository, there are the same two different kinds of byte range requests (some with a single range, some with multiple). The responses are incorrect for multi-part byte range requests, so getting the PDF piecemeal fails. Again, most resemble this, a single byte range:

**REQUEST**
Request: GET /download/duke:316943 HTTP/1.1
Range: bytes=2689536-2693631

**RESPONSE**
Response: HTTP/1.0 206 Partial Content
Accept-Ranges: bytes
Content-Range: bytes 2689536-2693631/2694205
Content-Type: application/pdf

But here’s where it’s problematic: when multiple byte ranges are requested in the same HTTP request:

**REQUEST**
Request: GET /download/duke:316943 HTTP/1.1
**Range: bytes=2693632-2694204, 1579520-1589247**

**RESPONSE**
Response: HTTP/1.0 206 Partial Content
Accept-Ranges: bytes
**Content-Range: bytes 2693632-2694204/2694205**
**Content-Type: application/pdf**

The response is incorrect; only the first range of bytes has been returned.

The problem appears to be a combination of 1) how the hydra-head gem parses range requests, and 2) that IE’s native PDF reader uses multipart range requests to begin with. We haven’t observed the problem in other browsers’ PDF readers; they likely all issue only single byte range requests.