Closed ariya closed 4 years ago
ariya.hi...@gmail.com commented:
This is again related to issue 41.
Metadata Updates
roejame...@gmail.com commented:
Issue 92 has been merged into this issue.
brian.th...@gmail.com commented:
I'm trying to implement this functionality and not making much progress. Using the attached patch, I run:
$ bin/phantomjs examples/download.js
and get this output:
WebPage instantiated WebPage instantiated Download complete - fail
I added cout of "WebPage instantiated" (to verify my debug messages work as expected). I also added a cout in my downloadRequested slot. That one did not get displayed. Can someone spot what I'm doing wrong or let me know if I'm on the completely wrong track?
Here is where I found out about the downloadRequested signal: http://doc.qt.nokia.com/latest/qwebpage.html#downloadRequested
brian.th...@gmail.com commented:
Whoops, here is the patch file attachment without the ANSI color codes
nperria...@gmail.com commented:
Any progress on this issue?
ariya.hi...@gmail.com commented:
No progress as of now.
nperria...@gmail.com commented:
A friend of mine (http://svay.com/) just told me a nice trick for dealing around with this issue, using XHR within the page environment and base64 encoding to retrieve file contents and it works rather great. For the record you can find an example here: http://jsfiddle.net/3kUXy/
gopiredd...@gmail.com commented:
The URL to the file is not always known so XHR is not a general solution. For instance, if you are downloading a utility/bank/cc statement, you may have to click a link which will possibly execute some JS code and trigger another page load with a frame embedding the PDF. Or the statement comes in as an attachment.
What will it take to support the file download feature?
Requirement: Download files that come in embedded in the page/frame or as attachments. The URLs may or may not be known. Allow saving the files to the file system or "upload" them to a web server (so the server can save the files in a DB for instance).
ja...@recovend.com commented:
I've got an early but functional version of this at
https://github.com/woodwardjd/phantomjs/tree/add_download_capabilities
Example:
var page = require('webpage').create();
page.onUnsupportedContentReceived = function(data) { console.log('Got a download at url: ' + data.url); page.saveUnsupportedContent('some.file.path', data.id); phantom.exit(); }
page.open('http://some.pdf.url.com/some.pdf');
I call this "early but functional" because it works where I've tested it (linux, PDF downloads), but has a likely small memory leak, and I'm not 100% convinced the callback mechanism I used is idea.
Comments desired.
rotava...@gmail.com commented:
I've downloaded and built the git for above, but I can't seem to get the onUnsupportedContentReceived event to fire and calling saveUnsupportedContent throws an undefined error. Are there special build steps required to enable it?
Thanks, Robert
ja...@recovend.com commented:
No special build steps required, as far as I know. If saveUnsupportedContent is undefined, maybe you haven't built the version in the add_download_capabilities branch (git checkout add_download_capabilities after the git clone)? Just speculating.
audi...@gmail.com commented:
I second the XHR+base64 method. It takes another 50+ lines of code to send to page.evaluate(), and I have to de-base64 the content afterward, and that's basically how CasperJS does it (as far as I can tell from their code—they do a lot of weird (unnecessary, in my book) binding with window.utils in the page context).
I used this one (first answer): http://stackoverflow.com/questions/7370943/retrieving-binary-file-content-using-javascript-base64-encode-it-and-reverse-de
It works great. Just be sure to try-catch the call to base64ArrayBuffer(), because Uint8Array(arrayBuffer) may throw an error, and check xhr.getHeader('content-type') == 'application/pdf' if you're doing pdf downloads like I was.
subel...@gmail.com commented:
I need this as well. Can't use the XHR method because the inline attachments I need to scrape don't come with a URL I can hit.
audi...@gmail.com commented:
Wouldn't inline attachments be even more easily downloaded? For an image: var content = page.evaluate(function() { return $('img#whatever').attr('src'); }); fs.write(yer_path, content, 'w');
Ariya, can you give some estimate of how long this feature (downloading a url) would take to implement? I'd love to get involved in PhantomJS development, but maybe this issue is a lot trickier than it sounds?
subel...@gmail.com commented:
Sorry, I didn't mean to write "inline". The file I need is not an image and is not part of the DOM. It gets sent as a result of a POST with the Content-Disposition header 'attachment;filename="report.csv"'
bogusan...@gmail.com commented:
Hi there. I think the base64-encoding solution can only be a stop-gap solution.
- Downloading big files will probably exhaust memory and base64 encoding and -decoding it will use up resources that would have better been spent elsewhere - therefore we want to have the option to redirect a downloaded stream to file
- We may have pages where we cannot control the loading of a file that is not supported (e.g. PDF)
- We may want to save resources that have already been loaded as part of the page (e.g. images)
I think the optimal solution would be to add functionality to the onResourceReceived hook to allow setting up a "redirection" handler, and if such a handler is set, unsupported file formats should silently be downloaded. This handler could then have another onDownloadFinished hook to resume operation once the download is done.
james.m....@gmail.com commented:
Metadata Updates
I'm interested in committing some of my company's resources to adding this feature. Is anyone already working on it? If so, could my company sponsor your work? If not, we can assign it to one of our own people. I just want to avoid duplicating anyone else's work.
I'm also interested in helping with this feature. We're trying to capture an Acrobat file that is sent as a result of a POST with the Content-Disposition header 'attachment;filename="file.pdf"' Is anyone working on this? I don't want to duplicate effort. Ideally we want to access the functionality from CasperJS as well.
any progress on this?
I'd love to see this fixed too. I saw @Vitallium has a fork with download support, as well as a few other fixes.
https://github.com/Vitallium/phantomjs/tree/download-support
Anyone else able/available to help merge the new code? I wouldn't be doing anyone a favor if I messed with the C codebase. I wouldn't mind donating towards a bounty for this.
This feature is under development. When it's ready, it'll be merged into the master tree. I can't say when this feature will be ready.
I'm also interested in this issue. Will we be able to render the pdf content as png / jpeg? Or is that altogether a different problem?
@FergusNelson that's a different problem, but much more easily solved using ghostscript, X11, ImageMagick, etc.
looks like @Vitallium is pretty far along with an awesome solution in his download-support
branch, described here: https://groups.google.com/forum/#!msg/phantomjs/JChUakj--24/epby47h3ZGAJ
I see that there are at least two attempts to address this issue on GitHub. @woodwardjd's add_download_capabilities branch, and @Vitallium's download-support branch. Is one of those a more promising path forward than the other? What work is outstanding before it would be ready to merge upstream?
@Vitallium how close is this to being merged with the master?
I rebased @Vitallium's download-support branch on a recent master HEAD.
I've been exercising it with a happy path test case, and it seems to be working fine.
@ariya and @Vitallium,
I'd like to continue the work that @Vitallium started if there's more to do.
What do you think blocks merging this upstream?
I'm actually want to rework the 'download-support' branch. I want to make it similar to real browsers. But I didn't post my ideas to the mailing-list yet (https://groups.google.com/forum/#!topic/phantomjs/JChUakj--24). So, i want to:
DownloadManager
(or smth similar)hi, we are having trouble with downloading files too, we gave a try to download-support branch code; but onFileDownload() callback seems not called - and we are assuming that it's because the web page does not return "content-disposition" header, but only "application/octet-stream" content type. (As the target page is not our code we can't change anything on server side.)
It seems that the phantomjs stops executing at clicking "download" button. So we are actually not very much sure if it is onFileDownload is not called, or the whole process is lost and suspended somewhere. However, we still are thinking that it is because of "application/octet-stream" content-type header.
I'm not sure if i'm making myself clear but we want to know if 1) our understanding is correct about missing Content-disposition header, 2) will Vitallium's DownloadManager solve this problem, and finally, 3) if yes, if it will be available sometime soon (say, within a month).
Thank you, minami
UPDATE: it seems this one works in our case: https://github.com/ariya/phantomjs/pull/11484
thank you
May I ask what is the progress for this function?
:+1:
:+1:
For some cases, one workaround is enabling phantomjs cache and scanning cache directory to retrieve that downloaded attachment.
This feature will be in the next version. So, stay tuned! On Apr 2, 2014 7:52 PM, "momogentoo" notifications@github.com wrote:
For some cases, one workaround is enabling phantomjs cache and scanning cache directory to retrieve that downloaded attachment.
— Reply to this email directly or view it on GitHubhttps://github.com/ariya/phantomjs/issues/10052#issuecomment-39347465 .
up!
Need this ASAP :-)
+1
@Vitallium do you have any details about when that will be?
For those who need file download ability now, from what I understand casperjs solves this.
Correction. I tried out casperjs and downloading large files does not work, they are 0 bytes. CasperJS folks say this relates to another bug in phantomjs, inability to set a larger timeout value. Please fix these bugs, downloading large files is very important for automation and testing!
push!
Happy to beta test anything here.
I'm trying to download an xslx file and get access to the content.
+1 for fix large timeout bug
I need to download an excel of 25 MB, every day, at same time. After login, search, and so on.
So casperJs was my friend ... could be my friend,because for this bug I cannot download the file ... sgrunt !!!!
@realtebo, did you try using CasperJS with SlimerJS? Because of PhantomJS bugs I use SlimerJS and it works very well.
I need this too ASAP
my current workaround is to use an XMLHttpRequest to GET the file as 'arraybuffer' inside page.evaluate() so we keep the page context with cookies and all, then use the 'fs' module to write the binary data.
var results = page.evaluate(function () {
// downloads have to be in the context of the web page
function downloadReport(id, name) {
console.log('downloading: ' + name);
var result = {};
try {
var xhr = new XMLHttpRequest();
xhr.open("GET", "http://host/api/v1/reports/" + id, false);
xhr.responseType = 'arraybuffer';
xhr.send(null);
var bin = xhr.response;
var u8 = new Uint8Array(bin), ic = u8.length, bs = [];
while (ic--) { bs[ic] = String.fromCharCode(u8[ic]); };
result.data = bs.join('');
result.name = name;
} catch (e) {
result.error = JSON.stringify(e);
}
return result;
}
var result = [];
result.push(downloadReport(123, 'report.pdf'));
return result;
}, token);
results.forEach(function (item) {
if (item.data != null)
fs.write(item.name, item.data, { mode: 'wb' } );
else
console.log(item.error);
});
+1
+1
I came up with another workaround. From within page.evaluate I click on the link I need to download, then listen for onResourceReceived.
page.set('onResourceReceived', function (resource) {
if (resource.contentType && resource.stage === 'end' && resource.contentType.indexOf('application/pdf') > -1) {
console.log(resource);
// Here you can download the file from resource.url by using http(s) request (e.g. https://gist.github.com/ialpert/3136595)
}
})
alexsa...@gmail.com commented:
Disclaimer: This issue was migrated on 2013-03-15 from the project's former issue tracker on Google Code, Issue #52. :star2: 40 people had starred this issue at the time of migration.