arangodb / arangodb-php

PHP ODM for ArangoDB
https://www.arangodb.com
Apache License 2.0
183 stars 43 forks source link

Poor performance from ODM vs Curl #184

Closed petitchevalroux closed 7 years ago

petitchevalroux commented 8 years ago

Hello,

I am testing different nosql datastores and during my test, i found major performance issue using the php odm.

Here are the two php files for the test : https://gist.github.com/petitchevalroux/9c83991972d0efd5083e

And benchmark results : Set size: 10000 Concurrency level: 10 ArangodbCurl:insert queries per second: 2277.4895435295 ArangodbCurl:read queries per second: 2337.0227794619 Arangodb:insert queries per second: 1296.1542879677 Arangodb:read queries per second: 1603.3962501986 Curl version perform +45% on read +75% on insert

Am i doing something wrong with the ODM ? Any explanation on this gap ?

jsteemann commented 8 years ago

The PHP-driver does a lot more than sending and receiving HTTP requests/responses. For example, it will convert all documents returned into Document objects, and it may run some validation on the attributes etc. This is something the curl variant doesn't do. The curl variant is reduced to pure HTTP transport and (de)serialization. I am not saying this is good or bad, but I guess part of the overhead may be attributed to this. Additionally, can you try setting the connection option ConnectionOptions::OPTION_CHECK_UTF8_CONFORM to a value of false when you connect? This may also improve performance and the extra checks can be spared if you are sure all your data is valid UTF-8.

petitchevalroux commented 8 years ago

I tried disabling utf8 and it improves performance but not significantly : Arangodb:insert query per second: 1434.5445558296 Arangodb:read query per second: 1583.8782517455

Validation document correponds to attributes type checking, right ? Like in https://github.com/arangodb/arangodb-php/blob/master/lib/triagens/ArangoDb/ValueValidator.php

On last question : I setup a TCP endpoint in config (using 2.5 version), does it talk to the HTTP Arangodb API ?

BTW Thank you for your work and your quick answer ;)

jsteemann commented 8 years ago

The PHP driver will may do several sorts of validation on outgoing data (it sends to the ArangoDB server) and also valdiation/conversion for incoming data. These operations will add some overhead, and I am wondering whether some of the checks may be turned off, e.g. for a production environment. If you do have any specific suggestions for this, please let me know.

Re your question: the PHP driver will always talk to the ArangoDB HTTP API. If you configured a TCP endpoint for the PHP driver, it will talk to that specified address via simple unencrypted HTTP over TCP/IP. E.g. if your endpoint is tcp://127.0.0.1:8529, the the driver will connect to 127.0.0.1 on port 8529 and send its HTTP requests there. It is required that the ArangoDB server is listening on that same address/port, too, of course.

petitchevalroux commented 8 years ago

Thanks for clarifying point 2 (HTTP vs TCP).

For my use case, I may consider using json_encode and json_decode as the only validators for document read/insert (May be a very bad idea :D)

For the AQL part, I don't know if JSON is the format between ODM and Arangodb HTTP API, but if so i will take the same validators :D. (May be a very very bad idea).

BTW, I am a noob to Arangodb so i don't know all the risks of such ideas.

jsteemann commented 8 years ago

The HTTP API of ArangoDB uses JSON for exchanging the actual documents with client drivers. So it's all JSON over HTTP.

Any extra validation on the client/driver side is actually not necessary, because the server will reject invalid JSON anyway. However, extra validation may be useful to catch errors during development. But from my point of view one should be able to turn checks off in production if they consume too much time.

jsteemann commented 8 years ago

I found some time to look into this tonight. There are a few issues (and fixes), which I'll detail below:

By turning off the UTF-8 checking and the HTTP processing fixes in devel, the insert test I did ran about 25 % faster than before. When additionally using a PHP array instead of a Document object when calling DocumentHandler::save(), total insertion time went down about 45 % in my case. I didn't try the other operations (e.g. reading documents or the import API I suggested). YMMV of course.