Closed ariya closed 9 years ago
ctvo...@gmail.com commented:
One more example of this issue is: http://snowplay.com which redirects to http://snowplay.com/cms/
ryandewh...@gmail.com commented:
Hi,
I'm having the same issue with version 1.6.1.
Thanks, Ryan
jfons...@ontech.com.au commented:
I am having the same issue with v 1.6.1.
I'm trying http://osc3.ezimobile.biz which redirects to http://osc3.ezimobile.biz/catalog
Here is the index.html file from that server:
Does anyone have any workarounds for this?
pe...@spotfront.com commented:
All of the examples provided here contain redirects in HTML (something like ) or in Javascript (window.self.location.replace('index.php' );). (I've yet to see PhantomJS fail to follow an HTTP redirect via the Location response header.) As much as I find this frustrating in my own projects, I think that PhantomJS is working as it ought. That said, having some sort of optional timeout that waits a configurable amount of time for any location changes before firing the page.open callback would save a lot of repetitive userland code.
ankitjai...@gmail.com commented:
I am facing a similar issue where the redirect is done using location.replace. Though it is kinda strange that it generates the image with phantomjs1.5.1 whereas it does not with v1.8.1
theobe...@gmail.com commented:
I am facing a similar issue, lots of sites redirect (301) but phantomjs fails to notice.
6 - why this is an intended feature? Seems this significantly limits the use of PhantomJS as a gui-less browser.
theobe...@gmail.com commented:
Just a small addenum: pjs2 netsniff.js http://www.forbes.com FAIL to load the address
This is not an expected output, of course.
theobe...@gmail.com commented:
Perhaps a fix would be to detect whether the first accessed document results with a redirection (302/301) and then just assume this is the main document we wanted to access from the start and proceed?
Simple solution for handling redirects
function renderPage(url) {
var page = require('webpage').create();
var redirectURL = null;
page.onResourceReceived = function(resource) {
if (url == resource.url && resource.redirectURL) {
redirectURL = resource.redirectURL;
}
};
page.open(url, function(status) {
if (redirectURL) {
renderPage(redirectURL);
} else if (status == 'success') {
// ...
} else {
// ...
}
});
}
Note for devs: I found a page on Nokia's wiki discussing how to setup the QNAM to follow redirects: http://www.developer.nokia.com/Community/Wiki/Handling_an_HTTP_redirect_with_QNetworkAccessManager
This is probably what we should be doing.
We are facing the same issues. We would donate if anybody fixes it :)
Adding a +1 for the fix, we're facing this too. Thanks!
My issue was that the target page has a 3rd party service installed (Optimizely) that runs in the head, and a few millis after issues a location.href.
Using the approach from here http://newspaint.wordpress.com/2013/04/25/getting-to-the-bottom-of-why-a-phantomjs-page-load-fails/
I noticed that this redirect registers as a navigation request from the page's main frame - at that point processing stops in phantom.
Here's what seems to work for me:
When I notice such a navigation request, I close the current page, and re-run the process on the new URL. I now can't seem NOT to get the screen shot of the final page.
var page;
var myurl="your.targeturl.com";
var renderPage = function (url) {
page = require('webpage').create();
page.onNavigationRequested = function(url, type, willNavigate, main) {
if (main && url!=myurl) {
myurl = url;
console.log("redirect caught")
page.close()
renderPage(url);
}
};
page.open(url, function(status) {
if (status==="success") {
console.log("success")
page.render('yourscreenshot.png');
phantom.exit(0);
} else {
console.log("failed")
phantom.exit(1);
}
});
}
renderPage(myurl);
same issue here, but using phantomjs trought selenium
When the webPage handles 301 redirect, in the final response header is 301 statusCode, not 200 as it should be, and other headers are from the first request who gives redirect
+1 for fixing this bug, can't use PhantomJS because of this.
+1 for me as well, this is problematic for my selenium testing
+1 need fix asap
@icezzzz: "need fix ASAP" == Fix it yourself or hire someone. The links I gave earlier in this thread should provide enough info (or close to it) to fix this.
Hey James et all
I'd like to help but sadly never programmed anything like this. Are there other todos? testing etc?
and - is development still active? I know all of you have day jobs and real lives to support :)
Cheers - sven
On Mon, Dec 16, 2013 at 5:13 PM, James M. Greene notifications@github.comwrote:
@icezzzz https://github.com/icezzzz: "need fix ASAP" == Fix it yourself or hire someone. The links I gave earlier in this thread should provide enough info (or close to it) to fix this.
— Reply to this email directly or view it on GitHubhttps://github.com/ariya/phantomjs/issues/10389#issuecomment-30718035 .
Sven Niemetz cell 415 407 2133 http://www.linkedin.com/in/sniemetz efax 435 603-2133
Development switched gears from fixing bugs and adding features to upgrading the underlying WebKit engine and Qt framework versions themselves as we believe this will actually solve 50% or more of the existing bugs. That effort is close to Technical Preview stage (working on Windows, anyway) and will likely be pushed back into this primary repo again in the near future.
When that happens, we're definitely going to need some folks to give it some much needed testing... Both to ensure it still meets your personal needs as well as before and to go through many of the open bugs to check if they are fixed by this upgrade.
ah cool thanks for the shout. Looking forward for the next rev!
On Mon, Dec 16, 2013 at 7:03 PM, James M. Greene notifications@github.comwrote:
Development switched gears from fixing bugs and adding features to upgrading the underlying WebKit engine and Qt framework versions themselves as we believe this will actually solve 50% or more of the existing bugs. That effort is close to Technical Preview stage (working on Windows, anyway) and will likely be pushed back into this primary repo again in the near future.
When that happens, we're definitely going to need some folks to give it some much needed testing... Both to ensure it still meets your personal needs as well as before and to go through many of the open bugs to check if they are fixed by this upgrade.
— Reply to this email directly or view it on GitHubhttps://github.com/ariya/phantomjs/issues/10389#issuecomment-30722705 .
Sven Niemetz cell 415 407 2133 http://www.linkedin.com/in/sniemetz efax 435 603-2133
var page; var myurl="http://osc3.ezimobile.biz";
var renderPage = function (url) { page = require('webpage').create();
page.onNavigationRequested = function(url, type, willNavigate, main) {
if (main && url!=myurl) {
myurl = url;
console.log("redirect caught")
page.close()
renderPage(url);
}
};
page.open(url, function(status) {
if (status==="success") {
console.log(myurl);
console.log("success")
page.render('yourscreenshot.png');
phantom.exit(0);
} else {
console.log("failed")
phantom.exit(1);
}
});
}
OUTPUTS - http://osc3.ezimobile.biz/ http://osc3.ezimobile.biz/catalog We now only need to find a way, how to request that redirected URL. Looking forward.
I've just met this issue (no output at all / js redirections in target). I tried setting the timer to 2000ms instead of 200ms and it worked.
(looks like the generated page with multiple redirections didn't have time to render).
@driket, my experience with this bug also makes me think it is related to time to render. Would you please give an example of how to set the timer to 2000 ms?
@driket, tried but didn't work'd in the case for http://osc3.ezimobile.biz/
@aevernon, you can do this by changing the window.setTimeout parameter (in the js file).
@yogeshunavane, I've tested the version 1.9.2 on OSX Maverick + Ubuntu 12 LTS -> seems to work (the output file is a full html file with relevant data)
Here is the output file I got for http://osc3.ezimobile.biz/ + the script I'm using : https://gist.github.com/driket/8348520
For those seeking a workaround to get the final redirected URL, you might be interested in Watir WebDriver plus Headless, although this solution uses Ruby instead of JavaScript.
sudo apt-get install rubygems xvfb firefox # Use iceweasel instead of firefox on Debian.
sudo gem install headless watir-webdriver
#!/usr/bin/ruby
require 'headless'
require 'watir-webdriver'
Headless.ly do
browser = Watir::Browser.new
browser.goto 'http://osc3.ezimobile.biz/'
puts browser.url
# Output is http://osc3.ezimobile.biz/catalog/
browser.goto 'https://silkflowers.affiliatetechnology.com/redirect.php?nt_id=1&URL=http://www.silkflowers.com'
puts browser.url
# Output is http://www.silkflowers.com/?utm_source=affiliatetraction&utm_medium=CommissionJunction
browser.goto 'http://snowplay.com/'
puts browser.url
# Output is http://snowplay.com/cms/
end
It seems that this issue is obsolete. I have just tested different redirects (http location header, html meta refresh tag, js location redirects) with recent phantomjs version (from npm on Linux). Everything works.
The only thing should be mentioned: when you deal with unknown redirect type, you shouldn't rely only on onLoadFinished
callback because redirect may be initiated after page load. However this problem can be solved using setTimeout
, waitFor
function (https://github.com/ariya/phantomjs/blob/master/examples/waitfor.js) and/or onNavigationRequested
callback.
@lexqt I still have that issue with the latest PhantomJS. For example, the following script fails: https://gist.github.com/n1k0/3046142.
Just run it with phantomjs location.js <original url> <redirect url>
.
Actually nevermind this, I made an uber-simple example for myself, and it worked. Maybe I was just missing something in the above script.
I'm running 1.9.7 which still seems to have trouble following redirects. I'm loading certain images that are being 301'd to another locations, but all those images seem to 404 in Phantom. Doesn't seem to be a timing thing as setting a 60s delay before rendering didn't help.
@wesleylancel same here. I was using CasperJS with PhantomJS 1.9.7 and got a 404 error on a redirect to HTTPS that worked in a regular browser. It did work with a self-compiled PhantomJS from the Github repo, though.
(compiled on Ubuntu 12.04.04 x86_64 with libqt4 4.8.1-0ubuntu4.8)
Wondering if anyone has come across a redirect when submitting a form not working. From observation of the events I see something along the lines of (See below):
The gist of it seems to be - the page posted to .../auth/UI/Login returns a 302 which causes the 5/Operation Cancelled error. However, the HTTP status and Location header aren't populated? On FF or Chrome this all works just fine Status/Location is populated and browser redirects appropriately.
My best guess would be some type of timing or race condition in the phantomjs binary (hope I'm wrong)
Has anyone seen anythign like this and/or have any ideas about workarounds? Everything I can try or think of doesn't work - I can't get a hold of the status or location in any of the phantomjs 'javascript' context.
*** onNavigationRequested **** Trying to navigate to: http://XXXXXXXX:8080/auth/UI/Login Caused by: FormSubmitted Will actually navigate: true Sent from the page's main frame: true *** onResourceRequested **** Request (#40): {"headers":[{"name":"Origin","value":"http://XXXXXXXX:8080"},{"name":"User-Agent","value":"Mozilla/5.0 (Macintosh; Intel Mac OS X) AppleWebKit/534.34 (KHTML, like Gecko) PhantomJS/1.9.7 Safari/534.34"},{"name":"Content-Type","value":"application/x-www-form-urlencoded"},{"name":"Accept","value":"text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8"},{"name":"Referer","value":"http://XXXXXXXXXX:8080/auth/UI/Login"},{"name":"Content-Length","value":"741"}],"id":40,"method":"POST","time":"2014-07-29T17:23:34.480Z","url":"http://XXXXXXXXXX:8080/auth/UI/Login"} Request (#40): {"objectName":""} *** onResourceError **** Unable to load resource (#40 URL:http://XXXXXXXXXXX:8080/auth/UI/Login) Error code: 5. Description: Operation canceled *** onResourceReceived **** Response (#40, stage "end"): {"contentType":null,"headers":[],"id":40,"redirectURL":null,"stage":"end","status":null,"statusText":null,"time":"2014-07-29T17:23:34.625Z","url":"http://XXXXXXXXXXXX:8080/auth/UI/Login"}
any news regarding this? I also face this problem. it is a https link redirecting with 302 and when I get the getPageSource() it is:
<html>
<head></head>
<body></body>
</html>
Strange...
In Chrome or firefox the redirect works without problems. i am using phantomjs 1.9.7 together with Seleniu webdriver (https://github.com/detro/ghostdriver)
Same problem with 1.9.8 and use with it selenium
I was able to sort out the redirect issue, but now I have a problem with Page.evaluate
not working.
I have the following JavaScript code saved in a file name ph_test.js.
var page;
var args = require('system').args;
var url_str = 'http://'+args[1];
var renderPage = function(){
page = require('webpage').create();
var myArgs = Array.prototype.slice.call(arguments),
url_str = myArgs[0];
// Set the viewport size
page.viewportSize = {
width: 320,
height: 480
};
// Sets the User Agent
page.settings.userAgent = 'Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_0 like Mac OS X; en-us) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8A293 Safari/6531.22.7';
/**
* From PhantomJS documentation:
* This callback is invoked when there is a JavaScript console. The callback may accept up to three arguments:
* the string for the message, the line number, and the source identifier.
*/
page.onConsoleMessage = function (msg, line, source) {
console.log('console> ' + msg);
};
/**
* From PhantomJS documentation:
* This callback is invoked when there is a JavaScript alert. The only argument passed to the callback is the string for the message.
*/
page.onAlert = function (msg) {
console.log('alert!!> ' + msg);
};
/**
* Handle Redirection
*/
page.onNavigationRequested = function(url_sub_str, type, willNavigate, main) {
if (main && url_sub_str != url_str)
{
url_str = url_sub_str;
console.log("redirect caught");
page.close();
renderPage(url_str);
}
};
/**
* Open the web page and run RRunner
*/
page.open(url_str, function(status) {
if (status === 'success') {
page.injectJs('https://code.jquery.com/jquery-1.11.2.min.js');
// Our "event loop"
if(!phantom.state)
{
phFunction(url_str);
}
else {
phantom.state();
}
}
else
{
console.log('failed');
}
page.close();
setTimeout(function(){
phantom.exit();
}, 1000);
function phFunction()
{
var myArgs = Array.prototype.slice.call(arguments),
url_str = myArgs[0]
;
page.evaluate(function (url_str) {
console.log('evaluate');
}, url_str);
page.render('screenshots/screenshot_full.png');
}
});
};
renderPage(url_str);
When I run the following command [pointing to my website]:
phantomjs ph_test.js www.restive.io
Everything works ok with no issues. However, when I run it for another website with mobile redirection [in this case taobao]
phantomjs ph_test.js www.taobao.com
Page.evaluate
doesn't run as I do not see the message 'evaluate' in my console.
I'd appreciate some help in resolving this.
I'm posting a solution here that has worked for me, what I've done is basically intersect the request done in onResourceRequested
function and extract the correct URL, I was trying to render a site that has a location
header redirection into other site, this is done after a previous redirection from HTTP to HTTPS. In this case, in the onResourceRequested
event, the last URL (after all the redirections was caught) looks like http://site.com/news.php,%20news.php
, the problem here is that when this url is going to be fetched the second portion ,%20news.php
causes a 404 (even if the redirection is being correctly followed) so my solution was to use onResourceRequested
to remove the additional URL portion causing the problems, the code looks like:
page.onResourceRequested = function(requestData, networkRequest) {
var reqUrl = requestData.url;
var newUrl = requestData.url.split(',%20')[0];
if (newUrl != reqUrl) {
networkRequest.changeUrl(newUrl);
}
};
page.open(url, function(status) {
if (status == 'success') {
page.render(path);
response.write('Success: Screenshot saved to ' + path + "\n");
} else {
response.write('Error: Url returned status ' + status + "\n");
}
page.release();
});
This feels a little "hacky" but it has solved my problem I haven't tested with other redirection techniques but I hope that this 2 cents helps someone, for more information a curl to the site without HTTPS looked like this (info take using curl --head):
HTTP/1.1 301 Moved Permanently
Server: nginx
Date: Tue, 17 Feb 2015 21:37:15 GMT
Content-Type: text/html
Content-Length: 178
Connection: keep-alive
Location: https://site.com/
Strict-Transport-Security: max-age=15768000
The same to the HTTPS site looked like:
HTTP/1.1 302 Moved Temporarily
Server: nginx
Content-Type: text/html
Connection: keep-alive
X-Powered-By: PHP/5.3.3
Set-Cookie: phpv2Q389C_visited=yes; expires=Tue, 17-Feb-2015 21:58:29 GMT; path=/
Set-Cookie: phpv2Q389C_lastvisit=1424205509; expires=Tue, 17-Feb-2015 22:38:29 GMT; path=/; domain=site.com; httponly
Location: news.php
Location: news.php
For what I've seen the location: news.php was causing the 404 that I was experiencing, the above code seems to work, although more testing is required :).
:+1:
:+1:
Ok so a small change in the script makes it load the websites fine.
var page;
var myurl="your.targeturl.com";
var renderPage = function (url) {
page = require('webpage').create();
page.onNavigationRequested = function(url, type, willNavigate, main) {
if (main && url!=myurl) {
myurl = url;
console.log("redirect caught")
page.close()
setTimeout('renderPage(myurl)',1); //Note the setTimeout here
}
};
page.open(url, function(status) {
if (status==="success") {
console.log("success")
page.render('yourscreenshot.png');
phantom.exit(0);
} else {
console.log("failed")
phantom.exit(1);
}
});
}
renderPage(myurl);
Hello. Ubuntu 14.04 x86 phantomjs 2.0.0
I also have a problem with redirects.
Task: Need to handle URL's like:
http://click.ticketswap.nl/track/click/30039336/www.ticketswap.nl?p=eyJzIjoiY0x6N3NXYThpZ0VGTGVsNVJzRC16R2hGVGFBIiwidiI6MSwicCI6IntcInVcIjozMDAzOTMzNixcInZcIjoxLFwidXJsXCI6XCJodHRwczpcXFwvXFxcL3d3dy50aWNrZXRzd2FwLm5sXFxcL2Rvd25sb2FkXFxcLzM2MTUyOFxcXC9jMTA5YmJjOWI4OGYzYTEyNTBjZDk3MTQyMmE2YWVkYVxcXC83NjQyNzFcIixcImlkXCI6XCIxNmE4NWI4Yzc5NmE0Y2UwOTk0Njc0M2RmM2MzODZkZlwiLFwidXJsX2lkc1wiOltcImQ4M2U3YmJmOTU3MTFkNDcyM2U4NjJlNTA1MWNjMWVhNTU5MDZlZjlcIl19In0
And: 1) Login to facebook. 2) Handle redirects (3 redirects)
3) Get URL of last page with 200 status code to download file. Something like:
https://ticketswap.s3.amazonaws.com/pdf-services/201508/74df1712-6cd3-4ce1-87bd-28a928762087/93ebfa4c-46ba-4b8d-9077-b5ecc34f4af0.page.pdf?response-content-disposition=attachment%3B%20filename%3Dticketswap-breakfast-club-mini-fest-ticket-764271.pdf&response-content-type=application%2Fpdf&AWSAccessKeyId=AKIAJA2AW7EYEF5JWHGQ&Expires=1447875603&Signature=hqbIX3GnTgLQnqqVyue4xGcBlF0%3D#_=_
I can't use phantomjs for this task, becouse current_url return something like:
https://www.facebook.com/login.php?skip_api_login=1&api_key=384197868327751&signed_next=1&next=https%3A%2F%2Fwww.facebook.com%2Fv2.0%2Fdialog%2Foauth%3Fredirect_uri%3Dhttps%253A%252F%252Fwww.ticketswap.nl%252Flogin%252Fcheck-facebook%26display%3Dpopup%26scope%3Demail%26response_type%3Dcode%26client_id%3D384197868327751%26ret%3Dlogin&cancel_url=https%3A%2F%2Fwww.ticketswap.nl%2Flogin%2Fcheck-facebook%3Ferror%3Daccess_denied%26error_code%3D200%26error_description%3DPermissions%2Berror%26error_reason%3Duser_denied%23_%3D_&display=popup
If I use Firefox selenium webdriver, I can control the flow and save files (https://github.com/stdex/web_crawlers/blob/master/ticketswap/ticketswap.py)
How to handle redirects correctly?
Just leaving my +1 to keep updated about this issue.
I'm using 1.9.8 version. When running in a Ubuntu64 box, the redirects are handled as expected. But I'm having problems when running in a RedHat64 box.
We know this bug has been hanging around for a very long time, and we apologize. Here are some concrete things that you can do:
page.url
not to accurately reflect redirects. If this is your problem, please try my encode-all-the-urls
branch; it replaces that bug with a different bug, in which page.url
is confused by <base href=>
. (We're still looking for a way to fix both bugs properly. Concrete, self-contained, minimal test cases in which either stock 2.0 gets page.url
wrong, or my branch gets this wrong, would be very helpful.)onLoadFinished
callback fires at approximately the same time the onload
event fires in the page. (They are not guaranteed to occur in any particular order relative to each other.) This is by design. If your problem is that JavaScript adjustments to window.location
, <meta refresh>
tags, or other such things do not get a chance to happen before the onLoadFinished
callback fires, you need to wait a bit after that callback (using setTimeout
or equivalent) before declaring the page "done." Unfortunately, we have no good way of determining that a page is done running all of its JavaScript and stuff (indeed, it might never be done).Because this bug is very old and lumps together a number of related issues with different causes, it is not useful as a bug report, and therefore I am going to close it. If you still have a problem, and my advice above does not fix it, please submit a new bug report, providing a concrete, minimal, and self-contained test case. If you don't know how to write a concrete, minimal, self-contained test case, or you need help following my advice, please ask for help on the phantomjs-users mailing list.
ctvo...@gmail.com commented:
Disclaimer: This issue was migrated on 2013-03-15 from the project's former issue tracker on Google Code, Issue #389. :star2: 13 people had starred this issue at the time of migration.