Ostico / PhpOrient

PhpOrient - Official Php driver based on the binary protocol of OrientDB.
Other
68 stars 37 forks source link

High memory allocation with 2.1.5 #55

Closed ncreea closed 7 years ago

ncreea commented 8 years ago

Hi Guys,

I recently upgraded our dev environment from 2.1.3 to 2.1.5 and experiencing random high memory allocations.

Eg: Allowed memory size of 134217728 bytes exhausted (tried to allocate 335544421 bytes) in /var/www/html/release/1.1.3/.../vendor/ostico/phporient/src/PhpOrient/Protocols/Binary/OrientSocket.php

I did some debugging and looks like randomly the read for size is returning gigantic values (eg.~ 320Mb, sometimes even higher). Also identified that all this values originate from /PhpOrient/Protocols/Binary/Abstracts/Operation.php, _readInt function called by _readString.

Analyzing the binary data obtained by the 4 byte read I see this gigantic values.

I've also logged the $_data variable content into a file and the final result of the read is only 1.1K and contains all the results.

Is this a bug in the OrientDB 2.1.5 binary protocol or can be caused by some protocol changes that aren't yet fully implemented by this library?

Thanks, Csaba

andreyvk commented 8 years ago

@ncreea Are you using any schema? I've had a similar problem recently when I changed schema on a class and some old records were left behind, containing data that was not in line with my new schema.

ncreea commented 8 years ago

@andreyvk yes, I'm using schema (all of them being subclasses of V or E) and as I can remember the last schema change was applied more then 3 months ago. How did you solved the issue you had?

andreyvk commented 8 years ago

@ncreea I figured out that the problem is due to the load of a certain record. Then I gradually started removing attributes from the record and thus stumbled upon that certain one that was causing my problem. One complex attribute was having a value of an 'embeddedlist` of a certain class (which I already removed), while the real schema for that attribute was already changed to 'embeddedlist' of an 'embeddedset'. This caused all sorts of hell :)

ncreea commented 8 years ago

Understood. I'm using only INT and STRING attributes.

pentium10 commented 8 years ago

@andreyvk Is there a way to detect this sort of problem and actually issue an appropriate exception with a decent error message that explains what shall be corrected?

andreyvk commented 8 years ago

@ncreea Do check if some of INT's were converted to strings and vice versa. That could be a problem. I'm also going to upgrade to 2.1.5 soon. Hope that doesnt screw me completely )))

@pentium10 I am not very familiar with how Orient's binary protocol operates, but if schema is available to the library, I think some sort of checking could be definitely be done behind the scenes. This library is quite new. I myself reported numerous bugs, but in the end it has all paid off. Besides, I believe there's no better one out there for PHP anyway.

If you guys could help debug and provide a concrete solution for the problem to @Ostico, then he will be able to solve problem quickly. Especially if it concerns your own schema and records. Bugs like that are quite tough to reproduce.

Ostico commented 8 years ago

Hi guys, surely this is a bug dued to a wrong record deserialization. Normally this should be easy to fix having an example dataset or a code snippet that can help me to reproduce the error.

Provide me a way to reproduce this and a description of your environment, this weekend i will try to fix it.

ncreea commented 8 years ago

OS: CentOS release 6.5 (Final) Linux orient0 2.6.32-431.el6.centos.plus.x86_64 #1 SMP Fri Nov 29 23:11:12 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Current OrientDB version: 2.1.5 (Upgrade history: 1.1.7 -> 2.1.3 -> 2.1.5) PHP: 5.4.13 (cli) (built: Mar 15 2013 11:27:51)

We have a set of structures with many attributes and using both SBTREE and LUCENE indexes. The current database size is ~4Gb.

We will try to spin up an amazon instance, setup a similar environment and structure. and try to make a few individual scripts that use queries taken from real system and I will keep you updated on the results.

Could you please drop me an email to csaba.nyiro@reea.net via I may provide additional informations and access to the environment when is set?

andreyvk commented 8 years ago

@Ostico I will hopefully try to reproduce mine today, but I think it's not just about my case. There could be numerous permutations for wrong deserialization. It would be nice to be able to detect such behavior in the library (looks like some sort of recursion issue?) and stop deserializing this particular attribute, throwing an exception or at least logging an error

ncreea commented 8 years ago

I was able to track down a query that always returns with socket_read(): unable to read from socket [104]: Connection reset by peer. However it is correctly executed in Studio.

SELECT nid, weight FROM ( SELECT @rid, *, count(1) as weight FROM ( SELECT expand(inV()) FROM ( SELECT * FROM ( SELECT expand(outE('SAVED')) FROM ( SELECT expand(in()) FROM ( SELECT FROM PRODUCT WHERE nid = 75143201 ) ) ) ) ) WHERE lines_count=1 GROUP BY @rid ORDER BY weight DESC ) WHERE @class='LOGO' AND status=1 AND nid<> 75143201

Where: PRODUCT is a subclass of V LOGO is a subclass of PRODUCT SAVED is a subclass of E PERSON is a subclass of V

lines_count and status are attributes of LOGO nid is attribute of PRODUCT and also PERSON

Each SAVED edge is a relation from PERSON to PRODUCT (LOGO)

If I change the 75143201 id to other randomly selected the query is executed and the response is returned correctly by the library.

This makes me think that it's the server's fault and nothing's wrong with the library, however I can't tell this for sure, because the same query executes correctly via Studio, but doesn't via the binary library and I can't see anything related in the server logs.

Note: please note that the query is taken from a system that for certain cases include intermediate WHERE conditions and this is why is so nested.

andreyvk commented 8 years ago

@ncreea This does look like a server fault for the reason that you are not returning the whole record, but only nid and weight which is not even in a record. The only thing I can think of right now is that nid is anINTEGER, but declared as a STRING or vice versa, but technically that shouldnt even be a problem!

I've executed a similar query on my system with my records, but it went through just fine. I'm stuck here. The only way I see it work is isolating those records (i.e. create a small test DB with only those records and those edges), making sure that the problem still persists and then send an export of this DB to @Ostico so he can reproduce and fix the problem

andreyvk commented 8 years ago

PS - if this is a demo DB and you dont mind disclosing info to only him, then just ZIP the whole folder from $ORIENTDB_DIR/databases/ and send it to him in a private email, because export-import fails sometimes on Orient :)) (bugs!)

andreyvk commented 8 years ago

@Ostico, I couldnt reproduce the problem, but I managed to screw up my whole database instead, trying to drop and create alternate properties, which differ from the original record schema... I think I would be ok to close issue #44, after this one is fixed, until I again stumble upon this problem

pentium10 commented 8 years ago

Let's not close the issue until it's not proved that is fixed.

Ostico commented 8 years ago

@ncreea , you tried to enable the BynaryProtocol Logs into OrientDB? Maybe there should be more info about your problem. The Connection reset by peer error, makes me think at first that is not a driver issue, but i can't say for sure. In some conditions OrientDB close the connection when a wrong message is sent ( for example in the header part ).

BTW, for the original issue, i sent you my personal email if you want sentd me your database ( or a part of it ) for debug.

Ostico commented 7 years ago

I close this. Old Driver and OrientDB version.