drewzboto / grunt-connect-proxy

Grunt Connect support for proxying API calls during development
MIT License
423 stars 122 forks source link

Unable to download pdf files through the proxy #73

Closed reda-alaoui closed 9 years ago

reda-alaoui commented 10 years ago

Hi everyone,

I am unable to download pdf files through the proxy served by an ordinary servlet at /myapp/pdfservlet/ on my tomcat server. When I try without proxy, pdf files are correctly downloaded.

Here is my proxy configuration:

{
    context: '/', // the context of the data service
    host: localhost, // wherever the data service is running
    port: 8080, // the port that the data service is running on
    ws: true, // Proxy websocket
    rewrite: { '^/myapp/realtime/fallback':  '/myapp/realtime'},
        excludedFileTypes: ['pdf', 'jar', 'gz']
}

Here is the request/response given by Chrome:

Remote Address:127.0.0.1:9000
Request URL:http://localhost:9000/myapp/pdfservlet
Request Method:GET
Status Code:200 OK

Request Headers
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding:gzip,deflate,sdch
Accept-Language:fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4,de;q=0.2,es;q=0.2
Connection:keep-alive
Host:localhost:9000
Referer:http://localhost:9000/myapp/jsp/foo.jsp
User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.103 Safari/537.36

Response Headers
connection:close
content-disposition:filename="foo.pdf"
content-language:fr-FR
content-type:application/pdf;charset=UTF-8
date:Wed, 10 Sep 2014 17:36:13 GMT
server:Apache-Coyote/1.1
transfer-encoding:chunked

It also happens on .gz files. It seems that the connection is closed before the end of the file because the pdf is just blank.

Am I missing something?

mcchin commented 10 years ago

I am facing similar issues, but instead of connection problems the PDF I downloaded via proxy is automatically converted to UTF-8. Is a problem for me as the contents of the PDF aren't encode in UTF, so the original PDF is 500KB in size. After the file is served through proxy it became 1MB in size.

Instead of via servlet/tomcat, mine was served via php/apache (running on ubuntu precise32)

Also worth mentioning is the file is retrieved via a PHP script like below:

<?php
$file = 'doc.pdf';

if (file_exists($file)) {
    header('Content-Description: File Transfer');
    header('Content-Type: application/octet-stream');
    header('Content-Disposition: attachment; filename='.basename($file));
    header('Expires: 0');
    header('Cache-Control: must-revalidate');
    header('Pragma: public');
    header('Content-Length: ' . filesize($file));
    readfile($file);
    exit;
}
?>

The HTTP request is like below:

GET /internal/api/download/13/6 HTTP/1.1
Host: localhost:9001
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.124 Safari/537.36
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8,zh-CN;q=0.6,zh;q=0.4,zh-TW;q=0.2,fr;q=0.2,de;q=0.2
Cookie: xxx=data

The HTTP response I got back (via connect-proxy) is like below:

HTTP/1.1 200 OK
date: Thu, 09 Oct 2014 05:22:35 GMT
server: Apache/2.2.22 (Ubuntu)
content-description: File Transfer
content-disposition: attachment; filename="doc.pdf"
expires: 0
cache-control: must-revalidate, post-check=0, pre-check=0
pragma: public
keep-alive: timeout=5, max=100
connection: Keep-Alive
content-type: application/pdf
Transfer-Encoding: chunked
miminno commented 10 years ago

+1

misja-alma commented 10 years ago

I'm facing the same problem. My pdf's character encoding is converted to UTF-8, which makes it corrupt so that it shows up as a blank page. It doesn't happen always, it seems to somehow depend on the contents of the PDF; sometimes the character encoding is converted, sometimes not. I don't know what triggers it, but it's certain that it is grunt-connect-proxy, because if I download my PDF directly the encoding is never affected.

miminno commented 10 years ago

I think the problem is with grunt-contrib-connect, not with the proxy. I tried directly serving pdf's with connect and got back garbage. Can anyone else confirm?

shangxiao commented 10 years ago

I also have blank pdfs, corrupt pdfs, corrupt pngs. Yet mp3s, jpgs download fine :)

pocketmax commented 9 years ago

I've got the same problem...

var http = require('http'),
    httpProxy = require('http-proxy'),
    proxy = httpProxy.createProxyServer({});

http.createServer(function(req, res) {

    proxy.web(req, res, { target: req.url });

}).listen(8005, function() {
    console.log('proxy listening on port 8005');
});

this works...

wget http://security.ubuntu.com/ubuntu/pool/main/p/php5/libapache2-mod-php5_5.3.10-1ubuntu3.15_amd64.deb

this does not...

wget -e http_proxy=127.0.0.1:8005 http://security.ubuntu.com/ubuntu/pool/main/p/php5/libapache2-mod-php5_5.3.10-1ubuntu3.15_amd64.deb
webskin commented 9 years ago

Same problem for me. Tomcat serves my pdf with Content-Length: xxx and the proxy 'Transfer-Encoding: chunked'. Surprisingly, when I change the extension of the url to .png but let Content-Type to application/pdf it works. So may be a content negotiation problem.

jaymes-bearden commented 9 years ago

Exact same problem with me. Does anyone have a suitable workaround for the time being?

reda-alaoui commented 9 years ago

A workaround would be great or at least knowing the library's layer to modify ...

ghost commented 9 years ago

A workaround would be sweet!

michi88 commented 9 years ago

+1 same problem here

AssiaAzzouzi commented 9 years ago

I have the same problem...did anyone find the answer?

juristr commented 9 years ago

unfortunately not yet. Didn't have the time to look into it, but I'm very interested in a solution.

reda-alaoui commented 9 years ago

I just found the solution for my case! For the record, I was using grunt-connect-proxy#0.1.11. My project was generated by Yeoman. Yeoman generated automatically a package.json containing:

"devDependencies": {
    ...
    "grunt-connect-proxy": "0.1.11",
    "grunt-contrib-connect": "0.7.1",
    ...
}

In package.json, I changed the grunt-contrib-connect version to 0.5.0 which is the one pulled by grunt-connect-proxy#0.1.11.

That solved pdf and gz corruption ! Don't forget to disable your browser cache for your pdf fetch tests.

k7n4n5t3w4rt commented 9 years ago

+1 for reda-alaoui's solution.

Changing the grunt-contrib-connect version to 0.5.0 in the main package.json file and doing an npm update fixed the problem.

sweco-seprst commented 9 years ago

+1 for reda-alaoui's solution.

Epimetheus89 commented 9 years ago

+1 for reda-alaoui's solution.

Had problems with .jpeg and .xls files. But that solution solved it. Even upgrading on newest version 0.10.1 did not help.

Edorka commented 9 years ago

+1 for reda-alaoui's solution.

jrno commented 8 years ago

+1 for reda-alaoui's solution.

klopfdreh commented 8 years ago

+1 for the solution mentioned by @reda-alaoui