gingeleski / headbro

Headless browser rendering service for HTTP responses.
GNU General Public License v3.0
4 stars 0 forks source link

Interceptors aren't working for loads from request strings #27

Open gingeleski opened 5 years ago

gingeleski commented 5 years ago

(See what was written in #25 comments then "test cases" provided in #24 comments)

gingeleski commented 5 years ago

Alright, changed the code around to read the actual response object coming back. Because recall we're commandeering BrowserMob via this Python library which just uses the requests library on it.

Got this so it's a JavaScript error -

<h2>HTTP ERROR: 500</h2>
<p>Problem accessing /proxy/8091/filter/request. Reason:
<pre>    Exception [JavascriptCompilationException - "Unable to compile javascript. Script in error:
if (messageInfo.getUrl().includes("Qwr14Twn")) { request.setMethod("GET"); request.setUri("http://twitter.com/i/js_inst?c_name=ui_metrics"); request
getMethod().removeHeaders("Host"); request.getMethod().addHeader("Host", "twitter.com");request.getMethod().removeHeaders("Connection"); request.get
ethod().addHeader("Connection", "close");request.getMethod().removeHeaders("User-Agent"); request.getMethod().addHeader("User-Agent", "Mozilla/5.0 (
indows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36");request.getMethod().removeHeaders("Accept");
request.getMethod().addHeader("Accept", "*/*");request.getMethod().removeHeaders("Referer"); request.getMethod().addHeader("Referer", "https://twitt
r.com/");request.getMethod().removeHeaders("Accept-Encoding"); request.getMethod().addHeader("Accept-Encoding", "gzip, deflate");request.getMethod()
removeHeaders("Accept-Language"); request.getMethod().addHeader("Accept-Language", "en-US,en;q=0.9");request.getMethod().removeHeaders("Cookie"); re
uest.getMethod().addHeader("Cookie", "_twitter_sess=BAh7CSIKZmxhc2hJQzonQWN0aW9uQ29udHJvbGxlcjo6Rmxhc2g6OkZsYXNo%250ASGFzaHsABjoKQHVzZWR7ADoPY3JlYXR
ZF9hdGwrCCwEHXBnAToMY3NyZl9p%250AZCIlZTVlYjQ4YjBiMzNlZjkxYTNjNDk1MjJiOWMyNzc1NDg6B2lkIiUzZjBh%250AZWMzODQyNzA2YzNiNTk4ZTBmNzEyOWU5YjI3Yw%253D%253D--
30d99ec83024c78af96cab8e27c93f2fa9ad533; personalization_id="v1_uMFXwPN0Vl1s72/E3SuAMQ=="; guest_id=v1%3A154377420906496841; ct0=56b8e8eee00322f4b80
c60d8bf2bad3");request.getMethod().removeHeaders("DNT"); request.getMethod().addHeader("DNT", "1"); };"] thrown by event method [public com.google.s
tebricks.headless.Reply net.lightbody.bmp.proxy.bricks.ProxyResource.addRequestFilter(int,com.google.sitebricks.headless.Request) throws java.io.IOE
ception,javax.script.ScriptException]

at net.lightbody.bmp.filters.JavascriptRequestResponseFilter.setRequestFilterScript(JavascriptRequestResponseFilter.java:34)
(See below for entire trace.)
</pre></p>
<hr /><i><small>Powered by Jetty://</small></i>

</body>
</html>

Going back to basics with the JavaScript until it's clearer what the issue is.

gingeleski commented 5 years ago

Still error'ing out with this...

    interceptor_js = ''
    ##interceptor_js += 'if (messageInfo.getUrl().includes("' + this_canary_string + '")) { '
    ##interceptor_js += 'request.setMethod("' + method + '");'
    ##interceptor_js += ' '
    interceptor_js += 'request.setUri("' + url + '");'
    interceptor_js += ' '
    # cycle through headers and set
    for h_name, h_value in headers.items():
        interceptor_js += 'request.getMethod().removeHeaders("' + h_name + '");'
        interceptor_js += ' '
        interceptor_js += 'request.getMethod().addHeader("' + h_name + '", "' + h_value + '");'
        interceptor_js += ' '
    if body != None:
        interceptor_js += ' '
        # TODO consider making sure the body is safely encoded, or at least escape " chars
        interceptor_js += 'contents.setTextContents("' + body + '");'
    ##interceptor_js += ' };'
    do_browsermob_interceptor(interceptor_js)
gingeleski commented 5 years ago

Is OK with just this... time to pinpoint amongst the difference

    interceptor_js = ''
    interceptor_js += 'request.setUri("' + url + '");'
    do_browsermob_interceptor(interceptor_js)
gingeleski commented 5 years ago

Added the conditional to check if the canary string is in the URL - that worked. But I noticed the request to the canary URL still wasn't being intercepted.

It dawned on me that sometimes the canary string spits out with uppercase letters, then it gets all lowercased on the real request. So I added a simple .lower() on that canary string generation.

For safe measure, also added a couple seconds of delay after the POST that sets the interceptor itself. Seems reasonable that maybe the request we want swapped is firing before the interceptor's processed.

Still get this :sad:

{"status_code": 0, "headers": [], "alerts": [], "confirms": [], "prompts": [], "errors": [{"level": "SEVERE", "message": "http://afapfimjc.com/ - Failed to load resource: the server responded with a status of 502 (Bad Gateway)", "source": "network", "timestamp": 1544316814571}], "messages": [], "body": "<html xmlns=\"http://www.w3.org/1999/xhtml\"><head></head><body>Bad Gateway: http://afapfimjc.com/</body></html>"}
gingeleski commented 5 years ago

Looked at source for request and response interceptors - https://github.com/lightbody/browsermob-proxy/blob/679d90d3059087d985d62e48124183dd50f2724e/browsermob-rest/src/main/java/net/lightbody/bmp/filters/JavascriptRequestResponseFilter.java

Request interceptors have these bindings available:

request -> io.netty.handler.codec.http.HttpRequest contents -> net.lightbody.bmp.util.HttpMessageContents messageInfo -> net.lightbody.bmp.util.HttpMessageInfo log -> org.slf4j.Logger

Response interceptors have these bindings available:

response -> io.netty.handler.codec.http.HttpResponse contents -> net.lightbody.bmp.util.HttpMessageContents messageInfo -> net.lightbody.bmp.util.HttpMessageInfo log -> org.slf4j.Logger

Should check the interceptors I've tried with a fine-toothed comb to see if any discrepencies here.