arangodb / arangodb-php

PHP ODM for ArangoDB
https://www.arangodb.com
Apache License 2.0
183 stars 44 forks source link

Strange issue with localhost vs 127.0.0.1 #286

Open Geolim4 opened 2 years ago

Geolim4 commented 2 years ago

Hello,

I'm the Phpfastcache maintainer and I'm currently implementing support of Arangodb for the next major release (v9).

But I noticed that using an endpoint with "localhost" is VERY slow (look for Test duration:):

[Begin Test: Arangodb driver]
[PHPFASTCACHE: CORE v9.0.0-alpha3#287beb8 | API v4.0.0]
[PHP v8.0.10 with: apcu, bcmath, calendar, Core, couchbase, ctype, curl, date, dom, fileinfo, filter, gd, gettext, hash, iconv, json, libxml, mbstring, mongodb, mysqlnd, openssl, pcre, PDO, pdo_mysql, pdo_sqlite, Phar, readline, redis, Reflection, session, SimpleXML, SPL, standard, tokenizer, xml, xmlreader, xmlwriter, zip, zlib]
---
[INFO] Running CRUD tests on the following backend: Phpfastcache\Drivers\Arangodb\Driver
[DEBUG] Clearing backend before running test...
[INFO] Using cache key: cache_key_6d7964d5eff0aebf5f343936
[PASS] The pool successfully saved an item.
[INFO] Re-fetching item by its tags and an unknown tag (tag strategy "ALL")...
[PASS] The pool expectedly failed to retrieve the cache item.
[INFO] Re-fetching item by its tags (tag strategy "ALL")...
[PASS] The pool successfully retrieved the cache item.
[INFO] Re-fetching item by its tags and an unknown tag (tag strategy "ONLY")...
[PASS] The pool expectedly failed to retrieve the cache item.
[INFO] Re-fetching item by one of its tags (tag strategy "ONLY")...
[PASS] The pool successfully retrieved the cache item.
[INFO] Re-fetching item by one of its tags and an unknown tag (tag strategy "ONE")...
[PASS] The pool successfully retrieved the cache item.
[PASS] The pool successfully retrieved the expected value.
[INFO] Updating the cache item by appending some chars...
[INFO] Deferred item is being committed...
[PASS] The pool successfully committed deferred cache item.
[PASS] The pool successfully retrieved the expected new value.
[PASS] The pool successfully deleted the cache item.
[PASS] The pool successfully cleared.
[PASS] The cache item does no longer exists in pool.
[INFO] I/O stats: 26 HIT, 7 MISS, 7 WRITE
[INFO] Driver info: No readable human data provided by ArangoDB :(
Test results:  0 assertions failed, 0 assertions skipped and 12 assertions passed out of a total of 12 assertions.
Test duration: 94.373s

Using "127.0.0.1" is drastically more efficient (look for Test duration:):

[Begin Test: Arangodb driver]
[PHPFASTCACHE: CORE v9.0.0-alpha3#287beb8 | API v4.0.0]
[PHP v8.0.10 with: apcu, bcmath, calendar, Core, couchbase, ctype, curl, date, dom, fileinfo, filter, gd, gettext, hash, iconv, json, libxml, mbstring, mongodb, mysqlnd, openssl, pcre, PDO, pdo_mysql, pdo_sqlite, Phar, readline, redis, Reflection, session, SimpleXML, SPL, standard, tokenizer, xml, xmlreader, xmlwriter, zip, zlib]
---
[INFO] Running CRUD tests on the following backend: Phpfastcache\Drivers\Arangodb\Driver
[DEBUG] Clearing backend before running test...
[INFO] Using cache key: cache_key_db75017e1a22a1575f313634
[PASS] The pool successfully saved an item.
[INFO] Re-fetching item by its tags and an unknown tag (tag strategy "ALL")...
[PASS] The pool expectedly failed to retrieve the cache item.
[INFO] Re-fetching item by its tags (tag strategy "ALL")...
[PASS] The pool successfully retrieved the cache item.
[INFO] Re-fetching item by its tags and an unknown tag (tag strategy "ONLY")...
[PASS] The pool expectedly failed to retrieve the cache item.
[INFO] Re-fetching item by one of its tags (tag strategy "ONLY")...
[PASS] The pool successfully retrieved the cache item.
[INFO] Re-fetching item by one of its tags and an unknown tag (tag strategy "ONE")...
[PASS] The pool successfully retrieved the cache item.
[PASS] The pool successfully retrieved the expected value.
[INFO] Updating the cache item by appending some chars...
[INFO] Deferred item is being committed...
[PASS] The pool successfully committed deferred cache item.
[PASS] The pool successfully retrieved the expected new value.
[PASS] The pool successfully deleted the cache item.
[PASS] The pool successfully cleared.
[PASS] The cache item does no longer exists in pool.
[INFO] I/O stats: 26 HIT, 7 MISS, 7 WRITE
[INFO] Driver info: No readable human data provided by ArangoDB :(
Test results:  0 assertions failed, 0 assertions skipped and 12 assertions passed out of a total of 12 assertions.
Test duration: 0.341s

Process finished with exit code 0

At this moment I'm really unable to tell you if it's arangodb-php-related, arangodb-related or simply php-related but I think you now have a lead to investigate on this issue and maybe warn your users in the WIKI/Docs.

Here's is the configuration used for the tests made above:

dothebart commented 2 years ago

hm, if you use Wireshark to capture the traffic, do you see any difference? Will both of them use 127.0.0.1 or maybe one of them ipv6? Will it do DNS lookups for localhost? Does it use a persistent connection? Whats in the HTTP-headers? Will they be the same? What will be the count of IP-Packets? (so if they are not the same - one connection will use a different package size?)

Geolim4 commented 2 years ago

I took the time to answer you the most detailed as possible:

hm, if you use Wireshark to capture the traffic, do you see any difference?

Yes a lot, I will post them below, but I'm not very familiiar with this tool.

Will both of them use 127.0.0.1 or maybe one of them ipv6?

Look like using localhost actually uses ipv6 indeed.

Whats in the HTTP-headers? Will they be the same?

Absolutely

Will it do DNS lookups for localhost? Does it use a persistent connection? What will be the count of IP-Packets? (so if they are not the same - one connection will use a different package size?)

See by yourself on the Wireshark trace below.

Wireshark trace with 127.0.0.1 (Keep-Alive connection):

127-0-0-1_KeepAlive.txt

Wireshark trace with localhost (Keep-Alive connection):

localhost_KeepAlive.txt

Wireshark trace with 127.0.0.1 (Close connection):

127-0-0-1_Close.txt

Wireshark trace with localhost (Close connection):

localhost_Close.txt

dothebart commented 2 years ago

hm, can you save the pcap files instead please? (in the file save menu) if you right-click one http-header, and choose "follow tcp-stream" or "follow http stream" you should see either several (if not all) requests of the trace in one go if connection keepalive is working.

Geolim4 commented 2 years ago

Here's what you asked:

The same 4 dumbs created with the exact same script. tcp_dumps.zip

dothebart commented 2 years ago

hm, starting with the filesizes:

-rw-r--r-- 1 willi willi  84828 Sep  6 18:11 127-0-0-1_Close.pcapng
-rw-r--r-- 1 willi willi  53122 Sep  6 18:10 127-0-0-1_KeepAlive.pcap
-rw-r--r-- 1 willi willi 236964 Sep  6 18:15 localhost_Close.pcapng
-rw-r--r-- 1 willi willi  61916 Sep  6 18:12 localhost_KeepAlive.pcapng

you can already see, that the localhost collection-close is ~3 times as large as the 127.0.0.1 one.

With connection keepalive, its still 1/6 the size which the localhost one is larger. Watching at the two Keepalives, right clicking "follow tcpstream" and once its popping open, click "filter out this stream" in the lower right, you will see that the number of PSH / ACK packages seems to be way larger, I'd guess roughly one ping-pong per HTTP request:

screenshot

this is probably responsible for the additional time required. But I'm no expert in the Microsoft TCP stack implementation to tell you why its doing this.

One can also see that tcp-retransmission pagkages are ipv6.

You also can see, that the the localhost ones have \Device\NFP_Loopback, which the other don't.

If you sniff port 50 tcp & udp, do you maybe see it also doing DNS-lookups? that could even make it slower.

Geolim4 commented 2 years ago

To be honest i don't have much more time to investigate on this as I finished Arangodb implementation in Phpfastcache but I can tell you this:

Many concurrent database I implemented (Couchdb/Mongodb/Couchbase/etc) don't have this issue even if I used "localhost" instead of "127.0.0.1".

At first I though it could be related to a misconfiguration of my hosts file that contain a lot of entries (for privacy tracking purposes) but none of them related to localhost.

And you're right, using a "Keep-Alive" connection is way much more efficient since my script send a lot of queries to the Arangodb API. The script is located here runCRUDTests() method. This is not surprising at all and its even logical, but its exponentially increasing the running time when using localhost endpoint.

dothebart commented 2 years ago

does it reproduce on linux? You see without keepalive it seems even 3 times the amount of packages is sent, else the file wouldn't differ so much in size.

jsteemann commented 2 years ago

@Geolim4 : from what I can tell is that the arangodb-php driver replaces localhost with [::1]:

    /**
     * Replaces "localhost" in hostname with "[::1]" in order to make these values the same
     * for later comparisons
     *
     * @param string $hostname - hostname 
     *
     * @return string - normalized hostname
     */
    public static function normalizeHostname($hostname) {
        // replace "localhost" with [::1] as arangod does
        return preg_replace("/^(tcp|ssl|https?):\/\/(localhost|127\.0\.0\.1):/", "\\1://[::1]:",  $hostname);
    }

If for whatever reason this is a problem in some environments, we could make this replacing optional. WDYT?

jsteemann commented 2 years ago

Oops, just realized that the above function will only be called if active failover is used and there is more than a single endpoint. If you are testing against a single server arangod, the specified endpoint/host name should be used as-is.

Geolim4 commented 2 years ago

does it reproduce on linux? You see without keepalive it seems even 3 times the amount of packages is sent, else the file wouldn't differ so much in size.

I tried it on my work computer (Ubuntu 18.04): No issue at all.

Oops, just realized that the above function will only be called if active failover is used and there is more than a single endpoint. If you are testing against a single server arangod, the specified endpoint/host name should be used as-is.

I'm using as single endpoint so I don't think this piece of code is called, indeed.

jsteemann commented 2 years ago

Ok, thanks! Gonna try to find a Windows machine to see if I can reproduce the issue locally.

Geolim4 commented 2 years ago

I would like to write a big troll joke about Windows users, but I'm myself developing on Windows at home for some personal reasons and I see it as an opportunity to catch some bugs on an "unusual" platform that you support :P

If you want to tests it easily with the scripts I made follow the steps:

Geolim4 commented 2 years ago

Also you uncomment this line before running tests to see the trace function printing debug:

https://github.com/PHPSocialNetwork/phpfastcache/blob/a2e587dc98d6fec673854779d124ebf79f3d0cce/tests/Arangodb.test.php#L34

dothebart commented 2 years ago

yes, definitely - Ports are good for stability and performance. Windows and Mac are much less reluctant against errors that linux may mitigate unheard - we have seen this on our CI more than once.