ariya / phantomjs

Scriptable Headless Browser
http://phantomjs.org
BSD 3-Clause "New" or "Revised" License
29.47k stars 5.75k forks source link

File download #10052

Closed ariya closed 4 years ago

ariya commented 13 years ago

alexsa...@gmail.com commented:

It would be good to accept (and save) 'Content-Disposition: attachment; filename=' content.

Disclaimer: This issue was migrated on 2013-03-15 from the project's former issue tracker on Google Code, Issue #52. :star2:   40 people had starred this issue at the time of migration.

seanpoulter commented 10 years ago

@Vitallium, if you need help getting this into the next release please do let me know. This is the biggest barrier to my testing so far. Thanks.

emschorsch commented 10 years ago

This issue is still marked as FutureRelease. Is there any possibility it will be included in 2.0?

rouskejfan commented 10 years ago

Has anyone already tried the @mrampersad solution?

agr commented 10 years ago

Did a quick test. Seem to work for me.

agr commented 10 years ago

Well, one issue: onDownloadFinished callback is not called, but file is saved fine. Nevermind. I was exiting phantom in onLoadFinished(fail), so it did not have chance to be called.

whymarrh commented 10 years ago

@agr would you be able to share a working example?

agr commented 10 years ago
var page = require('webpage').create();
page.onFilePicker = function(oldFile)
{
    console.log('onFilePicker(' + oldFile + ') called');
    return 'master.zip';
}
page.onDownloadFinished = function(status)
{
    console.log('onDownloadFinished(' + status + ')');
    phantom.exit(1);
}
page.onLoadFinished = function(status)
{
    console.log('onLoadFinished(' + status + ')');
}
page.open('https://github.com/facebook/php-webdriver/archive/master.zip');
realtebo commented 10 years ago

Has the bug about large downloaded been fixed ?

pilavdzic commented 10 years ago

Files over 2GB never have, and still don't, work for me. If anyone got this to work please let me know how.

On Tue, Oct 14, 2014 at 3:02 AM, Mirko Tebaldi notifications@github.com wrote:

Has the bug about large downloaded been fixed ?

— Reply to this email directly or view it on GitHub https://github.com/ariya/phantomjs/issues/10052#issuecomment-58998209.

MarsVard commented 9 years ago

is this stil being worked on? did it ever get merged into master?

bedantaguru commented 9 years ago

I'm working on windows @agr @mrampersad Is it possible to share the binary with download support ? I was not able to build it successfully. I have been struggling a lot for the same please help.

agr commented 9 years ago

If you are not afraid of running random executables from Internet, here you go: https://github.com/agr/phantomjs/releases/tag/pj-download

bedantaguru commented 9 years ago

Thanks a lot @agr

MarsVard commented 9 years ago

@agr do you have the binaries for osx and Linux too ?

agr commented 9 years ago

nope

ankitgr8 commented 9 years ago

@agr do you have the binary for 32bit Windows os

agborkowski commented 9 years ago

ping

ankitgr8 commented 9 years ago

@agr the binary for phantomjs to download file shared, does not work on windows xp. It says invalid 32 bit application. Is this expected behaviour The phamtomjs 1.9.8 works fine

ankitgr8 commented 9 years ago

Is phantomjs2 has the support for file download ?

bedantaguru commented 9 years ago

@ankitgr8 It worked for me in windows 8 and 7 64 bit. May be the binary is for 64 bit.

agr commented 9 years ago

Binary is 32 bit:

$ file phantomjs.exe phantomjs.exe: PE32 executable (console) Intel 80386, for MS Windows

It might be built with minimum Windows version set to something higher, than XP, but I am not sure how to check that.

ankitgr8 commented 9 years ago

Building should be independent of the windows version.. it dependent on the compile we used.. vc++ . unless and until we used some windows API which are not available in XP.

But since phantomjs1.9.8 works fine on windows xp.. wondering what new API used which in this phantomjs branch which are not compatible with windows xp.

DO we have the setps to how to create the binary from this branch

agr commented 9 years ago

Clone repository to local disk, run build.cmd? That's what I did.

ankitgr8 commented 9 years ago

The branch we are talking about is https://github.com/Vitallium/phantomjs/tree/download-support

does not have build.cmd.. can u please share the branch to clone

kalloc commented 9 years ago

Hi. I tried to use phantomjs with selenium as replacement for firefox. And I have a question, how to use this functional in webdriver?

vic-cw commented 9 years ago

@momogentoo thanks for the suggestion to scan the cache.

One question : cache files seem compressed. Did you find out how to uncompress them from the command line ?

ankitgr8 commented 9 years ago

I was able to build phantomjs 2.0 on windows and also merged the download capability feature in phantomjs 2.0 branch. Here's the link to download the exe if any one required. https://github.com/ankitgr8/phantomjs2.0 Thanks to Vitallium and ariya for this feature and easy of build env for phantom 2.0. How to use the feature , i have upload the readme also .. in the above link.. Thanks everyone for all help

agborkowski commented 9 years ago

@ankitgr8 :beers:

jmiller76 commented 9 years ago

PR #11557 looked like it was the furthest along on features. The last comment from @Vitallium looked like some tests were potentially holding up the merge.

There were also some notes about the need to have better tracking of a download's completion. In my local copy I had created a public property to expose m_downloadingFiles.count(); so I could see if files were actively downloading. Following the normal conventions though, this should be implemented as a Callback function when the private downloadFinished completes. I normally look at .NET so it isn't clear to me if this is accessible As-Is. It doesn't look like there is a clear method to tie the onFileDownload to a download ID that can be matched in onResourceReceived which I think is the closest to an OnDownloadFinished that exists to cover this need in the current implementation.

Is there something in Qt5 that may suggest a different approach to this? And require more tweaks?

Is there general acceptance that the API used in this version with page.onFileDownload and page.onFileDownloadError is where we are going? I would be more comfortable applying the patch locally and doing a Local build to know that this is how it will look once it is officially merged.

I read somewhere that @Vitallium had made a comment about wanting to provide some better download progress status, so I don't know if this bar is needed to get this accepted.

So do we just need to provide some tests and a version of the PR for the current Master (still getting my head around Git to know if this is needed) to move this forward or is more needed?

Sorry if this is too verbose, and Thank you to everyone that has been active on these threads, it has been very helpful to understand where things are.

Brade commented 9 years ago

🙏

MarsVard commented 9 years ago

Oh lawd! praise white baby jesus!

Brade commented 9 years ago

Simple page to use as a test case for this: http://www.fangraphs.com/projections.aspx?pos=all&stats=bat&type=steameru (click the "Export Data" link). Hard to believe PhantomJS can't handle this basic scenario, but good luck to those working on a solution.

Brade commented 9 years ago

FYI casperjs can handle this fine, as explained here (the answer by julianjm): http://stackoverflow.com/questions/16144252/downloading-a-file-that-comes-as-an-attachment-in-a-post-request-response-in-pha

My own code had even way less lines, and was more like:

var casper = require('casper').create({
    pageSettings: {
        webSecurityEnabled: false
    }
});

casper.start();

casper.thenOpen('http://www.fangraphs.com/projections.aspx?pos=all&stats=pit&type=steameru', function() {
    var postbody = this.page.evaluate(function() {
        $('#__EVENTTARGET').val('ProjectionBoard1$cmdCSV');
        return $('#form1').serialize();
    });
    casper.download('http://www.fangraphs.com/projections.aspx?pos=all&stats=pit&type=steameru', 'fg_pitchers.csv', 'POST', postbody);
});

casper.run();

Call the script from command line like so:

casperjs --ssl-protocol=any --cookies-file=cookies.txt myscript.js

All hail the "download" function 🙌

tommunro commented 9 years ago

This is the most frustrating issue with Phantom!

I need to be able to download pdf files for testing. While not able to do this directly via Phantom, I can do it using sync xmlhttprequest or async (to set the responseType to blob or arrayBuffer).

But either way, I end up with garbage.

The blob only outputs [object object] in the file and the arrayBuffer outputs a pdf that is garbled. The text portions are fine, but the encoded portions are garbage.

I have tried binary writes, conversion of the arrayBuffer as mentioned earlier, and all types of charsets (which should not be used on the binary format) but all result in corrupted pdfs.

Anyone have a working solution???????

Phantom really needs a solid download solution. I have tried some of the linked "solutions" but they result in no file. The file I am downloading is typed as an application.pdf but named xxxx.sap

ankitgr8 commented 9 years ago

Hi tommunro

Not sure for which OS your are looking for... But if you are working on windows then this is the phantom js 2.0 build with download functionality build in ,, u can download the exe from https://github.com/ankitgr8/phantomjs2.0 ..... for your testing purpose.. and also i have mentioned the sample js code for same

tommunro commented 9 years ago

Thank you for the fast reponse.

I have tried a couple different download builds for windows, but neither seemed to fire the ondownload event.

The link is a “getpdf.sap?...” link.

I can receive the file using both sync and async xmlhttprequest calls under page.evaluate such as:

var xhr = new XMLHttpRequest();

xhr.open('GET', tempLink, true);

xhr.responseType = 'arrayBuffer';

xhr.onload = function(e) {

window.callPhantom(xhr.response);

};

The above works when I include a wait in the evaluate for the async to complete.

With async I can get a blob or an arrayBuffer, but I have not been able to save the blob to a file.

With sync I can return the response directly rather than through a callback.

Both methods however result in a pdf file that looks ok in an editor, but the encoded (binary) portions are encoded incorrectly so the text does not appear.

I save the file using fs.write('test.pdf', data, 'w'); (tried wb also – just gives different encoding)

I should get:

%PDF-1.3

zG_ÕùßJ¤·°#s6­¦dR L„s­

1 0 obj

<<

But instead I get:

%PDF-1.3

zG_���J���#s6­�dR L�s­

1 0 obj

<<

With your download build, I “click” on the pdf link and use

page.onFileDownload = function(status) {

           console.log('onFileDownload(' + status + ')'); 

           return 'test.pdf'; 

}

page.onFileDownloadError = function(status) {

           console.log('onFileDownloadError(' + status + ')');

           //phantom.exit(1);

}

But these never seem to get called.

Tom

From: ankitgr8 [mailto:notifications@github.com] Sent: Saturday, June 20, 2015 3:51 AM To: ariya/phantomjs Cc: tommunro Subject: Re: [phantomjs] File download (#10052)

Hi tommunro

Not sure for which OS your are looking for... But if you are working on windows then this is the phantom js 2.0 build with download functionality build in ,, u can download the exe from https://github.com/ankitgr8/phantomjs2.0 ..... for your testing purpose.. and also i have mentioned the sample js code for same

— Reply to this email directly or view it on GitHub https://github.com/ariya/phantomjs/issues/10052#issuecomment-113724216 . https://github.com/notifications/beacon/AMXx286eBClkxWc027ZqNTGG_iPgfwf2ks5oVRLWgaJpZM4Ajhpl.gif

ankitgr8 commented 9 years ago

try adding some debug in your script.. and check what request is being send and what response is being received.. and check the content-type of the response .. if it is HTML then onFileDownload will not be called

Try below script example to show the was send and received

page.onResourceReceived = function(status){console.log('onResourceReceived(' + status.contentType + ')'); if(status.stage === 'end'){phantom.exit(0);}} page.onResourceRequested = function(requestData, networkRequest){console.log('onResourceRequested(' + JSON.stringify(requestData) + ')');}

tommunro commented 9 years ago

ok, cool.

After much tweaking, it turns out that the funky onclick event for this particular link was not generating a resource request like the others. I changed the page to "open" the link directly and that did fire the onDownload, while failing to load the page as expected. The pdf came out intact!

So your download build works as described! Thank you!

Is there a build for Linux? or build instructions?

ankitgr8 commented 9 years ago

linux Build instruction are on phantom js2.0 site.

skornev commented 9 years ago

Latest source code did not work for me. I built phantomjs, but it did not download file. Even onFilePicker event did not fire. So I used previous source code provided by agr.

In case who are interested in binary with download support (it's not static) for Debian 7
https://github.com/skornev/phantomjsbinary

irongaze commented 9 years ago

+1, in the vain hope this helps move this issue to the top of the queue...

radnov commented 9 years ago

++, lets make that vain hope double

tforgione commented 9 years ago

++

Audace commented 9 years ago

++

rijulraju commented 9 years ago

++

ankitgr8 commented 9 years ago

can u please share some sample code to download using casperjs

pwaldhauer commented 9 years ago

@ballesdbc Please share sample code, would be very appreciated.

pwaldhauer commented 9 years ago

Thanks, this works!

lanzorg commented 9 years ago

@pwaldhauer @ballesdbc Where is the code sample? Does it work with large files too?

tommunro commented 9 years ago

this works great for me:

// these two functions answer to ankitgr8 download patch page.onFileDownload = function(status) { messageLog('==> Download complete'); var name = rawPdf + '.pdf'; return name; } page.onFileDownloadError = function(status) { messageLog('onFileDownloadError(' + status + ')'); }

// in my case, the page link will not work by "clicking" it - I need to formally open it // the page open triggers the above function page.open(tempLink, function(status) { if (status !== 'success') { //console.log('FAIL to load pdf'); } });

hope this helps everyone.