Papanikles / rutorrent

Automatically exported from code.google.com/p/rutorrent
0 stars 0 forks source link

[PATCH] The GeoIP plugin should resolve hosts in parallel #767

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
gethostbyaddr() is terrible - it blocks for 20 seconds on IPs with 
misconfigured DNS.

This one is a bit hackish as it makes assumptions about packet format, but 
works with most resolvers. Add new conf.php option dnsResolver = '8.8.8.8'; 
Maybe make socket timeout configurable, too.

Index: lookup.php
===================================================================
--- lookup.php  (revision 2185)
+++ lookup.php  (working copy)
@@ -15,6 +15,10 @@
        $HTTP_RAW_POST_DATA = file_get_contents("php://input");
    if(isset($HTTP_RAW_POST_DATA))
    {
+       if ($dnsResolver) {
+           $dns = fsockopen("udp://".$dnsResolver, 53);
+           socket_set_timeout($dns, 1, 0);
+       }
        $vars = explode('&', $HTTP_RAW_POST_DATA);
        foreach($vars as $var)
        {
@@ -68,9 +72,19 @@
                    $host = $value;
                                         if($retrieveHost)
                                         {
-                                                $host = gethostbyaddr($value);
-                                                if(empty($host) || 
(strlen($host)<2))
-                                                        $host = $value;
+                       if ($dnsResolver) {
+                           $ipmap[$value] = count($ret);
+                           $pkt = pack("n", rand(0,32767)) . "\1\0\0\1\0\0\0\0\0\0";
+                           foreach (array_reverse(explode(".", $value)) as $part)
+                               $pkt .= pack("C", strlen($part)) . $part;
+                           $pkt .= "\7in-addr\4arpa\0\0\x0C\0\1";
+                           fwrite($dns, $pkt);
+                           $host = $value;
+                       } else {
+                           $host = gethostbyaddr($value);
+                           if(empty($host) || (strlen($host)<2))
+                               $host = $value;
+                       }
                                         }
                                         $comment = '';
                                         if($retrieveComments)
@@ -83,5 +97,29 @@
                }
            }
        }
+       if ($dnsResolver) {
+           $todo = count($ret);
+           while ($todo-- && ($buf=@fread($dns, 512))) {
+               $pos = 12;
+               $ip = array();
+               while (($count_a = unpack("C", $buf[$pos++])) && ($count = $count_a[1])) {
+                   if (count($ip) < 4)
+                       array_unshift($ip, substr($buf, $pos, $count));
+                   $pos += $count;
+               }
+               if (substr($buf, $pos, 10) != "\0\x0C\0\1\xC0\x0C\0\x0C\x00\x01")
+                   continue;
+               $pos += 16;
+               $host = array();
+               while (($count_a = unpack("C", $buf[$pos++])) && ($count = $count_a[1])) {
+                   array_push($host, substr($buf, $pos, $count));
+                   $pos += $count;
+               }
+               $host=implode(".", $host);
+               $ip=implode(".", $ip);
+               $ret[$ipmap[$ip]]["info"]["host"] = $host;
+           }
+       }
+       fclose($dns);
    }
    cachedEcho(json_encode($ret),"application/json");

Original issue reported on code.google.com by karel.t...@gmail.com on 27 Sep 2012 at 4:16

GoogleCodeExporter commented 8 years ago
This doesn't work for me. In result i have only one record. I think, this is 
because you must read answer immediately after sending one packet.

Original comment by novik65 on 28 Sep 2012 at 9:40

GoogleCodeExporter commented 8 years ago
Connected udp sockets should be buffered, at least on linux (my setup is 
debian6, 2.6.32). I think it simply expires on higher-latency connections 
before replies arrive. Try to increase the timeout or receive buffer size:

socket_set_timeout($dns, 5, 0);
socket_set_option($dns,SOL_SOCKET,SO_RCVBUF,65536); 

Another possibility is your DNS server sends unexpected replies, or their DNS 
has some obscure setup (in particular, CNAME->PTR are not handled).

Original comment by ezzz...@gmail.com on 28 Sep 2012 at 8:54

GoogleCodeExporter commented 8 years ago
>Try to increase the timeout

BTW in your code you doesn't check - if a timeout occur. But i increase it.

>socket_set_option($dns,SOL_SOCKET,SO_RCVBUF,65536);

PHP Fatal error:  Call to undefined function socket_set_option. It disabled in 
my system. As result - it may be disabled in users system too.

Yet once. 
1) If i send packets for 2 peers - then i got host for one of its only. 
2) If i send packets for 5 peers - then i got hosts for 4 of its only. 
3) If i send packets for 10 peers - then i got hosts for 9 of its only. 
Code with gethostbyaddr give all hosts (2, 5 and 10) in all cases.

Default recv buffer size is 4096. Response size (in the first case) - 90. As 
result - buffer size hasn't relation to this.

>Another possibility is your DNS server sends

I test you code for 2 different systems (CentOS 5.4 and OnenSuse 11.2) with 2 
different DNS servers. Results are the some.

Original comment by novik65 on 29 Sep 2012 at 8:38

GoogleCodeExporter commented 8 years ago
> BTW in your code you doesn't check - if a timeout occur. But i increase it.

Once the socket timeouts, fread() returns empty or incomplete buffer.

> Code with gethostbyaddr give all hosts (2, 5 and 10) in all cases.

I'm observing the same thing - there is off-by-one error somewhere, I'll 
investigate. Thanks for testing!

Original comment by karel.t...@gmail.com on 29 Sep 2012 at 11:40

GoogleCodeExporter commented 8 years ago
Ok, here goes cleaned up version. Results seem to be the same as for ghbn even 
for large (700 peers) torrents.

Added support for compressed names in the result set, it seems to be used quite 
a bit.

conf.php:
<?php
    // configuration parameters

    $retrieveCountry = true;
    $retrieveHost = true;

    // use faster internal resolver instead of gethostbyaddr().
    $dnsResolver = '8.8.8.8';

    // give up resolving host names after this number of seconds
    $dnsResolverTimeout = 3;

    $retrieveComments = false;

?>

--- lookup.php  (revision 2185)
+++ lookup.php  (working copy)
@@ -15,6 +15,12 @@
        $HTTP_RAW_POST_DATA = file_get_contents("php://input");
    if(isset($HTTP_RAW_POST_DATA))
    {
+       if ($dnsResolver) {
+           $dns = fsockopen("udp://$dnsResolver", 53);
+           socket_set_timeout($dns, $dnsResolverTimeout, 0);
+           $randbase = rand(0, 32000);
+           $idx = 0;
+       }
        $vars = explode('&', $HTTP_RAW_POST_DATA);
        foreach($vars as $var)
        {
@@ -68,9 +74,20 @@
                    $host = $value;
                                         if($retrieveHost)
                                         {
-                                                $host = gethostbyaddr($value);
-                                                if(empty($host) || 
(strlen($host)<2))
-                                                        $host = $value;
+                       if ($dnsResolver) {
+                           $pkt = pack("n", $randbase + $idx) . "\1\0\0\1\0\0\0\0\0\0";
+                           $ipmap[$value] = $idx++;
+                           foreach (array_reverse(explode(".", $value)) as $part)
+                               $pkt .= chr(strlen($part)) . $part;
+                           $pkt .= "\7in-addr\4arpa\0\0\x0C\0\1";
+                           fwrite($dns, $pkt);
+                           fflush($dns);
+                           $host = $value;
+                       } else {
+                           $host = gethostbyaddr($value);
+                           if(empty($host) || (strlen($host)<2))
+                               $host = $value;
+                       }
                                         }
                                         $comment = '';
                                         if($retrieveComments)
@@ -83,5 +100,40 @@
                }
            }
        }
+       if ($dnsResolver) {
+           while ($idx-- && ($buf=@fread($dns, 512))) {
+               $pos = 12;
+               $ip = array();
+               while ($count = ord($buf[$pos++])) {
+                   if (count($ip) < 4)
+                       array_unshift($ip, substr($buf, $pos, $count));
+                   $pos += $count;
+               }
+               $ip = implode(".", $ip);
+               if (substr($buf, $pos, 10) != "\0\x0C\0\1\xC0\x0C\0\x0C\x00\x01")
+                   continue;
+               $pos += 16;
+               $host = array();
+               while ($count = ord($buf[$pos++])) {
+                   if ($count >= 0xc0) {
+                       $count = (($count&0x3f) << 8) | ord($buf[$pos]);
+                       if ($count < $pos-1) {
+                           $pos = $count;
+                           continue;
+                       } else {
+                           $host = false;
+                           break;
+                       }
+                   }
+                   array_push($host, substr($buf, $pos, $count));
+                   $pos += $count;
+               }
+               if ($host) {
+                   $host = implode(".", $host);
+                   $ret[$ipmap[$ip]]["info"]["host"] = $host;
+               }
+           }
+       }
+       fclose($dns);
    }
    cachedEcho(json_encode($ret),"application/json");

Original comment by karel.t...@gmail.com on 30 Sep 2012 at 12:59

GoogleCodeExporter commented 8 years ago
>Once the socket timeouts, fread() returns empty or incomplete buffer.

Yes, it can return *incomplete* buffer. And you code will continue work with 
incorrect data. This is a not good thing, IMHO.

Anyway, thanks for your work. And sorry, i can test your code (and apply patch) 
~ at the next week only.

Original comment by novik65 on 30 Sep 2012 at 2:24

GoogleCodeExporter commented 8 years ago
> This is a not good thing, IMHO.

Waiting for *all* the data defeats the purpose of rapid lookups. If you're not 
content with fast resolver which occassionaly misses a reply due to timeout, 
simply don't set 'dnsResolver' in config - slow (but correct) resolver will be 
used instead :)

Btw - man 7 udp. It states the packets are either as a whole, or nothing. There 
is no such thing as receiving "half" of udp packet, even in presence of 
timeouts.

Original comment by karel.t...@gmail.com on 30 Sep 2012 at 4:32

GoogleCodeExporter commented 8 years ago
1) Can't see any changes in the results. 10 peers, 9 hosts.
2) Some results may be strange. See line with 222.254.51.182.

[08.10.12 12:50:49] Peers count ($idx): 10
[08.10.12 12:50:49] 188.230.117.173 - ip-188-230-117-173.airbites.net.ua
[08.10.12 12:50:49] 71.227.162.132 - c-71-227-162-132.hsd1.wa.comcast.net
[08.10.12 12:50:49] 121.84.45.184 - 121-84-45-184f1.hyg2.eonet.ne.jp
[08.10.12 12:50:49] 184.15.31.162 - 184-15-31-162.dr01.chtn.wv.frontiernet.net
[08.10.12 12:50:49] 67.61.228.177 - 67-61-228-177.cpe.cableone.net
[08.10.12 12:50:49] 82.228.0.251 - maa78-1-82-228-0-251.fbx.proxad.net
[08.10.12 12:50:49] 84.25.220.96 - 5419DC60.cm-5-2d.dynamic.ziggo.nl
[08.10.12 12:50:49] 222.254.51.182 - localhost
[08.10.12 12:50:49] 85.229.8.72 - 
c-4808e555.024-135-73746f36.cust.bredbandsbolaget.se

Original comment by novik65 on 8 Oct 2012 at 8:49

GoogleCodeExporter commented 8 years ago
But if i comment line 

if (substr($buf, $pos, 10) != "\0\x0C\0\1\xC0\x0C\0\x0C\x00\x01")
    continue;

then all OK. In the 'continue' case buffer contain 
"\0\x0C\0\1\xC0\x0E\0\x06\x00\x01".

Original comment by novik65 on 8 Oct 2012 at 9:07

GoogleCodeExporter commented 8 years ago
ad 1)

I've used breakdown in http://www.netfor2.com/dns.htm for reference.

\0\x0C\0\1\xC0\x0C\0\x0C\x00\x01 - start at offset 12, rrtype PTR, of class 1 
(ipv4?)
\0\x0C\0\1\xC0\x0E\0\x06\x00\x01 - means at offset 14, rrtype *CNAME*, of class 
1

Please confirm on a linux box that IP is indeed a CNAME host via:

$ dig -x <ip>

As stated in the first post, CNAME is not implemented. Because the protocol 
becomes complex at that point (ptr reply may, or may not be in the same packet 
- in which case next query/response cycle recursively invoked until we get PTR).

Removing that continue statement is extremely dangerous, because of high 
possibility of passing junk to clients. It's better to not resolve the ip at 
all, than return outright garbage.

Probably it should be noted in conf.php that the resolver is somewhat limited.

As for 2):
$ host 222.254.51.182
182.51.254.222.in-addr.arpa domain name pointer localhost.

Nothing wrong at all. gethostbyaddr() will not bother verifying the results 
either, nor i think there is need to.

Original comment by karel.t...@gmail.com on 12 Oct 2012 at 5:55

GoogleCodeExporter commented 8 years ago
Also you can try this variant (decreasing idx only if the continue condition 
passes). I suspect it might work, but slow things down because there will be 
case of waiting for a packet which is never going to arrive.

+       if ($dnsResolver) {
+           while ($idx && ($buf=@fread($dns, 512))) {
+               $pos = 12;
+               $ip = array();
+               while ($count = ord($buf[$pos++])) {
+                   if (count($ip) < 4)
+                       array_unshift($ip, substr($buf, $pos, $count));
+                   $pos += $count;
+               }
+               $ip = implode(".", $ip);
+               if (substr($buf, $pos, 10) != "\0\x0C\0\1\xC0\x0C\0\x0C\x00\x01")
+                   continue;
+               $idx--;
+               $pos += 16;
+               $host = array();
+               while ($count = ord($buf[$pos++])) {
+                   if ($count >= 0xc0) {
+                       $count = (($count&0x3f) << 8) | ord($buf[$pos]);
+                       if ($count < $pos-1) {
+                           $pos = $count;
+                           continue;
+                       } else {
+                           $host = false;
+                           break;
+                       }
+                   }
+                   array_push($host, substr($buf, $pos, $count));
+                   $pos += $count;
+               }
+               if ($host) {
+                   $host = implode(".", $host);
+                   $ret[$ipmap[$ip]]["info"]["host"] = $host;
+               }
+           }

Original comment by karel.t...@gmail.com on 12 Oct 2012 at 6:08

GoogleCodeExporter commented 8 years ago
>Also you can try this variant (decreasing idx only if the continue condition 
passes)

This variant works correctly.

Original comment by novik65 on 15 Oct 2012 at 7:38