fennb / phirehose

PHP interface to Twitter Streaming API
705 stars 189 forks source link

PROBLEM! HTTPS Phirehose via Proxy #11

Closed marklochrie50265 closed 8 years ago

marklochrie50265 commented 13 years ago

Since the recent updates to the Streaming API (SSL) We have been struggling to get the latest version of phirehose to work behind our proxy.

After making minor changes to the phirehose script in order to work with our proxy The error log we are getting is

Error: [Tue Oct 18 13:59:37 2011] [error] [client 194.80.32.10] Phirehose: Connecting to twitter stream: https://stream.twitter.com/1/statuses/filter.json with params: array ( 'delimited' => 'length', 'track' => '#x$ [Tue Oct 18 13:59:37 2011] [error] [client 194.80.32.10] Phirehose: Resolved host stream.twitter.com to 199.59.148.138 [Tue Oct 18 13:59:37 2011] [error] [client 194.80.32.10] Phirehose: Connecting to 199.59.148.138 [Tue Oct 18 13:59:37 2011] [error] [client 194.80.32.10] Phirehose: Full URL: ssl://199.59.148.138:443 [Tue Oct 18 13:59:37 2011] [error] [client 194.80.32.10] Phirehose: TCP failure 20 of 20 connecting to stream: No route to host (113). Sleeping for 16 seconds.

Changes : $opts = array('http' => array('proxy' => 'tcp://wwwcache.lancs.ac.uk:8080', 'request_fulluri' => true)); $context = stream_context_create($opts);

 //@$this->conn = fsockopen($scheme . $streamIP, $port, $errNo, $errStr, $this->connectTimeout,$context);
     @$this->conn = stream_socket_client($scheme . $streamIP . ":" .$port, $errNo, $errStr, $this->connectTimeout, STREAM_CLIENT_CONNECT, $context);

We are now thinking it could be to do with

Opening an insecure connection to the proxy which then attempts to forward the plain text request to twitter. We were thinking perhaps it can be re-coded using cURL instead of fsock? But we arent sure on this matter.

Thanks

fennb commented 13 years ago

Hey there,

Looking back, there was a change made from using stream_create_context() to raw fsockopen() - though for the life of me I can't remember why. The commit that has the change is: 3ada7f63090999a43c1acdf401b3e42c1978aea1

That said, PHP's curl functionality, particularly with curl_multi_select() seems like a perfect fit and should work fine.

It would definitely make a lot of sense to migrate the main network I/O loop to this functionality, which would then support things like proxies/etc a lot more easily.

DarrenCook commented 13 years ago

CURL never seems to be installed by default on any of the server hosting or distros I use, so the lack of dependency on curl is a feature of Phirehose. Though, of course, a minor one.

fennb commented 13 years ago

Hmmm, interesting. We could always make it try to use curl if available and fail over but that starts making the code very messy.

DarrenCook commented 13 years ago

Maybe something useful here: http://www.phpclasses.org/discuss/package/3/thread/34/ I think it is saying call stream_socket_enable_crypto() after establishing the connection to the proxy.

marklochrie50265 commented 13 years ago

Thanks for all input, I attempted the stream_socket_enable_crypto() but couldnt really understand where to use within code.

If something like this could be ported over to use curl that would be amazing feenb.

Damn my university for the darn proxy!

marklochrie50265 commented 13 years ago

Hey guys thanks again fro all the input this is the log from nohup.out when i try and run get_tweets.

PHP Warning: fopen(process_id.txt): failed to open stream: Permission denied in /var/www/141dev/db/gcse/get_tweets.php on line 17 PHP Warning: fwrite() expects parameter 1 to be resource, boolean given in /var/www/141dev/db/gcse/get_tweets.php on line 18 PHP Warning: fclose() expects parameter 1 to be resource, boolean given in /var/www/141dev/db/gcse/get_tweets.php on line 19 Phirehose: Connecting to twitter stream: https://stream.twitter.com/1/statuses/filter.json with params: array ( 'delimited' => 'length', 'track' => 'gcse',) Phirehose: Resolved host stream.twitter.com to 199.59.148.138 Phirehose: Connecting to 199.59.148.138 PHP Warning: stream_socket_enable_crypto(): SSL operation failed with code 1. OpenSSL Error messages: error:140770FC:SSL routines:func(119):reason(252) in /var/www/141dev/libraries/phirehose/phirehose.php on line 579 Phirehose: Full URL: ssl://199.59.148.138:443 Phirehose: Connection established to 199.59.148.138 Phirehose: Path: ssl:///1/statuses/filter.jsonHost: stream.twitter.com Phirehose: POST /1/statuses/filter.json HTTP/1.1

Phirehose: Host: stream.twitter.com:443

Phirehose: Content-type: application/x-www-form-urlencoded

Phirehose: Content-length: 27

Phirehose: Accept: /

Phirehose: Authorization: Basic: bWFya2xvY2hyaWU6dGhlbzE1MTE=

Phirehose: User-Agent: Phirehose/0.2.gitmaster +https://github.com/fennb/phirehose

Phirehose:

Phirehose: delimited=length&track=gcse

Phirehose:

Phirehose: HTTP failure 1 of 20 connecting to stream: HTTP ERROR 400: Bad Request (<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">ERROR: The requested URL could not be retrieved

ERROR

The requested URL could not be retrieved


While trying to process the request:

&# 22;&#  3;&#  1;

The following error was encountered:

). Sleeping for 10 seconds. Phirehose: Connecting to twitter stream: https://stream.twitter.com/1/statuses/filter.json with params: array ( 'delimited' => 'length', 'track' => 'gcse',) Phirehose: Resolved host stream.twitter.com to 199.59.148.138 Phirehose: Connecting to 199.59.148.138 PHP Warning: stream_socket_enable_crypto(): SSL operation failed with code 1. OpenSSL Error messages: error:140770FC:SSL routines:func(119):reason(252) in /var/www/141dev/libraries/phirehose/phirehose.php on line 579 Phirehose: Full URL: ssl://199.59.148.138:443 Phirehose: Connection established to 199.59.148.138 Phirehose: Path: ssl:///1/statuses/filter.jsonHost: stream.twitter.com Phirehose: POST /1/statuses/filter.json HTTP/1.1

Phirehose: Host: stream.twitter.com:443

Phirehose: Content-type: application/x-www-form-urlencoded

Phirehose: Content-length: 27

Phirehose: Accept: /

Phirehose: Authorization: Basic: bWFya2xvY2hyaWU6dGhlbzE1MTE=

Phirehose: User-Agent: Phirehose/0.2.gitmaster +https://github.com/fennb/phirehose

Phirehose:

Phirehose: delimited=length&track=gcse

Phirehose:

Phirehose: HTTP failure 2 of 20 connecting to stream: HTTP ERROR 400: Bad Request (<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">ERROR: The requested URL could not be retrieved

ERROR

The requested URL could not be retrieved


While trying to process the request:

&# 22;&#  3;&#  1;

The following error was encountered:

). Sleeping for 20 seconds. Phirehose: Connecting to twitter stream: https://stream.twitter.com/1/statuses/filter.json with params: array ( 'delimited' => 'length', 'track' => 'gcse',) Phirehose: Resolved host stream.twitter.com to 199.59.148.138 Phirehose: Connecting to 199.59.148.138 PHP Warning: stream_socket_enable_crypto(): SSL operation failed with code 1. OpenSSL Error messages: error:140770FC:SSL routines:func(119):reason(252) in /var/www/141dev/libraries/phirehose/phirehose.php on line 579 Phirehose: Full URL: ssl://199.59.148.138:443 Phirehose: Connection established to 199.59.148.138 Phirehose: Path: ssl:///1/statuses/filter.jsonHost: stream.twitter.com Phirehose: POST /1/statuses/filter.json HTTP/1.1

Phirehose: Host: stream.twitter.com:443

Phirehose: Content-type: application/x-www-form-urlencoded

Phirehose: Content-length: 27

Phirehose: Accept: /

Phirehose: Authorization: Basic: bWFya2xvY2hyaWU6dGhlbzE1MTE=

Phirehose: User-Agent: Phirehose/0.2.gitmaster +https://github.com/fennb/phirehose

Phirehose:

Phirehose: delimited=length&track=gcse

Phirehose:

Phirehose: HTTP failure 3 of 20 connecting to stream: HTTP ERROR 400: Bad Request (<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">ERROR: The requested URL could not be retrieved

ERROR

The requested URL could not be retrieved


While trying to retrieve the URL:0&să‚å

The following error was encountered:

). Sleeping for 40 seconds.

DarrenCook commented 13 years ago

I've worked through this and have all http/https and proxy/no-proxy options working. However this is for another server/client, so I've not tested in phirehose or with Twitter; I've also only tested with Apache's mod-proxy on 127.0.0.1, and only tested with a self-signed SSL cert. As the changes are quite intrusive I'm not going to try a quick hack and will instead just document the changes.

First, if using a proxy, you need a separate block of code for it. Second, if port 80, use GET with the full URL. If port 443, then use CONNECT, wait for the response from the proxy, then stream_socket_enable_crypto(), then send the actual request headers. Third, only STREAM_CRYPTO_METHOD_SSLv3_CLIENT worked for me. STREAM_CRYPTO_METHOD_SSLv23_CLIENT caused Apache to send a weird 400 error.

So the block will look something like this:

if($using_proxy){
  $this->conn=fsockopen("tcp://{$proxy_ip}",$proxy_port,$errno,$errstr,/*timeout=*/10);
  //TODO: Error handling here
  if($port==443){
    $out="CONNECT {$urlParts['host']}:443 HTTP/1.1\r\n";
    $out.="Host: {$urlParts['host']}:443\r\n";
    $out.="Proxy-Connection: Keep-Alive\r\n";
    $out.="\r\n";
    fwrite($this->conn, $out);
    //TODO: should be checking proxies response in case of error!
    while(1){
        $s=trim(fgets($this->conn,8192));
        if($s=='')break;
        }
    stream_socket_enable_crypto($this->conn,true,STREAM_CRYPTO_METHOD_SSLv3_CLIENT);
    $out="GET ".$urlParts['path'].'?'.$urlParts['query']." HTTP/1.1\r\n";
    }
  else{ //$port==80
    $out="GET ".$url." HTTP/1.1\r\n";
    }
  $out .= "Host: {$urlParts['host']}\r\n";
  $out.="\r\n";
  fwrite($fp, $out);
  }

That code is for GET, not POST, but POST should just involve sending some more http headers (in the second stage, not in the CONNECT stage).

cromanelli commented 12 years ago

Well althought this is not completely the issue discussed here , is somewhat related. I would like to be able to select the TCP port that phirehose will use to stablish twitter´s connection. has any of you guys being able to do this ?

Thanks,

Chris

DarrenCook commented 12 years ago

If Twitter actually offer alternative ports you could connect to them by editing this line in Phirehose.php: $port = ($urlParts['scheme'] == 'https') ? 443 : 80;

cromanelli commented 12 years ago

Thanks Darren I think i didn´y ask the right question. I have an app that is behind a linux server and i was uptading my iptables but after leaving the 443 and 80 ports open and closing everything else it does not receive any stream . and according to tcpdump Phirehose randomly opens other ports so i can´t close everything else. If you have any idea of how to control and setting those ports it will be great ... Thanks

DarrenCook commented 12 years ago

@cromanelli It sounds like you've blocked source ports instead of (or as well as) destination ports. (Or the other way round :-) If so you'll find trying to do anything with the internet fails; read up on iptables for how to fix it. A good troubleshooting test is to try using wget to connect to somewhere (e.g. http://twitter.com and https://twitter.com). If that works but Phirehose does not then it may be a Phirehose issue (or bad auth credentials, etc.); if they both fail it is a server or network issue.