Closed iddl closed 8 years ago
What benefit does it have?
It may not be a critical enhancement but it would likely improve the flexibility of the tool.
I have been fiddling around with phantomjs and CDNs and found out some services like Incapsula may be looking at the order of HTTP request headers other than values to determine the type of browser.
Here are the images of two GET requests, the first made by Firefox 26 and the second by phantomjs using customHeaders to mimic Firefox.
Firefox:
Phantomjs with customHeaders:
Below is the code I used to set the headers. Some of the field values may not be compatible, however, my goal was to get two identical HTTP responses from the server.
page.customHeaders = {
"User-Agent" : "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:26.0) Gecko/20100101 Firefox/26.0",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate",
"Connection" : "keep-alive"
};
Given the same values for the headers I would expect identical responses, for some reason this doesn't happen. The order of the fields may make a difference.
Thoughts ?
I also find this feature useful. Any plans to add it soon?
:+1: +1
+1 Without possibility to modify HTTP headers order it is impossible to fetch sites protected with Incapsula.
+1
+1
I have successfully made PhantomJS fetch a page from an Incapsula-protected site (www.enjin.com), with modification of two files:
gist (including example script): https://gist.github.com/GunsAkimbo/aa6ac81bd55dd1802637
It's not pretty, just a proof-of-concept of the changes to make in order for not be stopped by Incapsula.
So in principle we are interested in making changes like these. However:
__phantom
as well as, or instead of, phantom
. The right change there would be to enable the controller script to remove the phantom
intrusion entirely, if it isn't needed, and/or rename it as it sees fit.I'm using PhantomJS for a website screenshotting service, and I'm unable do much on Incapsula protected sites. This is a clear issue for multiple people, don't fully understand why its not being addressed....
@yegors Well, PJS uses an external library for network communication (QT), and I think the problem lies partly there, which means we cannot fix that in this repo. The other part is to hide the global object with a fixed name "_phantom" or have the ability to rename it runtime before loading a page.
You could try to compile a build yourself, with the changes mentioned in the gist I linked to a few posts back. Those changes made it possible to pass through the protection, but I'm not sure if both changes were necessary.
As an alternative, you could look into https://phantomjscloud.com/site/index.html I have had good results using this service for incapsula-protected sites.
@GunsAkimbo Its pretty unfortunate that QT refuses to fix it on their end. Custom compile seems like its the best (only) option at this point, as a 3rd party service is out of the question for our applications. Will try your patch and see what happens.
Whew, thanks guys for documenting this! I would have wasted hours trying to support incapsula. Will move my scripts now to firefox/selenium.
@GunsAkimbo trying to implement your fix. qhttpnetworkrequest.cpp where is this file located? cant find in this repo.
@opahopa That file belongs to the QT-repo, it used to be referenced in the .gitmodules
-file, I guess that has been changed, according to the history.
@GunsAkimbo any idea how to change the headers order now?
Since this problem marked as out-of-scope
by Qt I believe we can close it too.
Because future versions of PhantomJS will use the system-installed (or original version) of Qt.
Also, RFC describes that the order of HTTP headers doesn't matter.
Thanks!
The order does actually matter. Some anti-bot systems use it to identify phantom and block requests with a particular order.
On Sep 9, 2016 8:02 AM, "Vitaly Slobodin" notifications@github.com wrote:
Since this problem marked as out-of-scope by Qt I believe we can close it too. Because future versions of PhantomJS will use the system-installed (or original version) of Qt.
Also, RFC describes that the order of HTTP headers doesn't matter.
Thanks!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ariya/phantomjs/issues/11859#issuecomment-245939504, or mute the thread https://github.com/notifications/unsubscribe-auth/AAS-TfrLzR6eBpwK94nJ9a3fE3af2Punks5qoXTsgaJpZM4BWy1A .
Yes, I know that. But the problem is that implementing this feature require a custom version of Qt. We want to move away from custom (patched) version to the original version.
@maximilianh You can put PhantomJS behind a proxy which reorders headers as you want. Such proxy could be implemented in any language without much trouble, or may be there is an existing solution
@annulen Do you know of any such proxy service providers or proxy libraries in Python/Ruby/NodeJS which can reorder the headers? I have tried many libraries which can modify the headers but they cannot reorder them. Any help is appreciated. Thanks.
Is header reordering required for HTTPS, or plain HTTP is enough?
@annulen : It would be better if possible for both. Otherwise plain HTTP is also OK.
For HTTPS it would require "bumping" SSL connections which would significantly complicate code of proxy, even if we don't consider things like using client certificates or validating server certificate on client side. In case HTTPS is needed it's indeed much easier solution to fix order on client side, i.e. patch Qt.
If you are only concerned with Host
header position, it would be better to write a patch for https://bugreports.qt.io/browse/QTBUG-51557, it will be accepted
Do we have any other soluion other than proxy?
Fix the code, seriously.
There are 2 independent issues here:
Proxy-Connection
/Connection
, Accept-Encoding
, Accept-Language
, User-Agent
, Host
Accept-Encoding
means that PhantomJS or QtWebKit need to decompress content encoding with zlib instead of relying on QNAM automagic.Host
manually, so solving QTBUG-51557 won't hurt.customHeaders
property, so their order is not preserved and replaced with lexicographic. This can be worked around by using container that preserves orderUpdate: patch for QTBUG-51557 will be included into Qt 5.10.1, see https://codereview.qt-project.org/#/c/216980/.
Hello, I implement @GunsAkimbo concept to bypass Incapsula. You can download phantomjs at https://drive.google.com/drive/folders/1Y0XqQ89hQUhDj9_EPW-kja8V1vX4Catf?usp=sharing
There're 2 files, 1 for window & 1 for linux.
According to RFC2616 (http://www.w3.org/Protocols/rfc2616/rfc2616.html)
however, some implementations may pay particular attention to the order of these fields.
Are there any plans to support custom orderings ?
eg. Have "Connection: Keep-Alive" come before "Accept-Encoding: gzip".
A webpage property similar to this would probably work:
page.headerFieldsOrder = ["Accept", "Accept-Language", "Host", "Connection"...];
Thanks