Old de/serialization format?

andreyvk commented 8 years ago

Hi @Ostico ,

That time (a couple of days back) when I was trying to find a problem regarding high memory allocation (issue #44), I managed to corrupt my database, by modifying my schema in certain ways. The database became completely unrecoverable and so I reported an issue to the OrientDB guys (see https://github.com/orientechnologies/orientdb/issues/5290).

Basically they are saying that database could've been corrupted because serialization from PhpOrient is using old format, which is from specs of 1.* binary protocol. On the other hand, commands I issue from Studio or console.sh are, of course, are treated as 2.1.*

Is this correct that PhpOrient is using old binary protocol specs to de/serialize data? If yes, then do you have any plans on upgrading any time soon?

Ostico commented 8 years ago

Yes, PhpOrient use binary csv serialization instead of the new binary one.

But this is not related to bugs on that serialization protocol, you don't? :)

They should fix this untill ( and if ) it is supported.

By the way, the new serialization type ( which i don't like, in my opinion ) is in my roadmap. I think in the early 2016.

andreyvk commented 8 years ago

@Ostico , Im not sure, if it's related or not. The guys asked my corrupted db, which i will send him soon. Lets see what he says after taking a look at it.

This is true, they should support old style de/serialization, if they have it )) I just hope it doesnt anyhow affect the way I am doing things right now.

Hope to see the implementation when you have time to get your hands on that!

Thanks!

smolinari commented 8 years ago

What are your concerns about the new serialization type @Ostico ? My only concern is it doesn't perform as well as, or can actually perform better than the csv serialization. What type of serialization is the "new one" btw?

Scott

Ostico commented 8 years ago

The new serialization type differs basically because the preeceding one ( the actual ) was a csv like serialization, so it is called CSV-Serialization . Easy to parse recursively, almost efficient on the client side and easy to read and debug for humans.

The new one called Schemaless-Binary-Serialization is a serialization that implements a varint serialization for Integer types and the record field names are stored in the header of the response packet.

It is difficult to parse, i think less efficient on the client side, and very difficult to read and debug for humans and poorly documented. The last two thing makes the protocol very hard to implement and maintain and the estimated effort is too high.

Moreover there are exceptions to the field name rule depending if the database is schema-less, schema-full or in mixed mode. ( Be careful: this is the last implementation i know and this is relative to 4 months ago ).

Btw, OrientDB team claim it is more efficient on the server side basically because the records are mapped more or less 1:1 to OrientDB memory.

smolinari commented 8 years ago

Thanks for the reply Domenico.

I wonder how much of a performance gain is won on the server side with the new serialization and if it compensates for the loss of efficiency on the client side and, if it is enough of a win, to make up for the more difficult implementation and maintenance.

Scott

Ostico / PhpOrient

Old de/serialization format? #56