Open tlt-miamed opened 6 years ago
I could trace down the problem to the custom serialization of Doctrine_Record and Doctrine_Collection.
To make it easier to understand the problem. Here is an abstract example of what happens in Doctrine:
$bar = new Model();
$str = serialize($bar);
$res = unserialize($str);
class RelatedModel
{
}
class Collection implements Serializable
{
private $data = null;
private $_snapshot = null;
public function __construct()
{
$this->data = $this->_snapshot = [new RelatedModel()];
}
public function serialize()
{
return serialize(get_object_vars($this));
}
public function unserialize($serialized)
{
$vars = unserialize($serialized);
foreach ($vars as $k => $v) {
$this->$k = $v;
}
}
}
class Model implements Serializable
{
private $_data = ['arrayField' => ['one', 'one', 'one', 'one', 'one', 'one', 'one', 'one', 'one',]];
private $relation = null;
public function __construct()
{
$this->relation = ['RelatedModels' => new Collection()];
}
public function serialize()
{
$vars = get_object_vars($this);
// fields of type array are serialized before the rest
$vars['_data']['arrayField'] = serialize($vars['_data']['arrayField']);
return serialize($vars);
}
public function unserialize($serialized)
{
$vars = unserialize($serialized);
foreach ($vars as $k => $v) {
$this->$k = $v;
}
$this->_data['arrayField'] = unserialize($this->_data['arrayField']);
}
}
This will result in this serialized string:
C:5:"Model":330:{a:2:{s:5:"_data";a:1:{s:10:"arrayField";s:132:"a:9:{i:0;s:3:"one";i:1;s:3:"one";i:2;s:3:"one";i:3;s:3:"one";i:4;s:3:"one";i:5;s:3:"one";i:6;s:3:"one";i:7;s:3:"one";i:8;s:3:"one";}";}s:8:"relation";a:1:{s:13:"RelatedModels";C:10:"Collection":82:{a:2:{s:4:"data";a:1:{i:0;O:12:"RelatedModel":0:{}}s:9:"_snapshot";a:1:{i:0;r:19;}}}}}}
The problem lies in r:19;
. This is a reference which should point to O:12:"RelatedModel":0:{}
because RelatedModel
is reference twice in Collection
but the number is wrong.
As far as I know serialize gives every object in the serialized sting a number to reference it later but the calculated number is wrong. I think the problem lies in Model::serialize()
.
On serialize PHP serializes $bar
in this order
Model {
arrayField
Collection {
RelatedModel in data
RelatedModel in _snapshot (as Reference)
}
}
and every node in the result will get a number to reference it later
On unserialize we change the order (due to the custom serialization)
Model {
Collection {
RelatedModel in data
RelatedModel in _snapshot (fails to find the reference because arrayField was not handled yet)
}
arrayField
}
and fail because arrayField is out of order.
Attention: This bug can also lead to corrupt data. If the arrayField
contains only a small array the reference will point to a node in the serialized string which exists but is wrong.
PHP does not support the double serialization, very weird.
<?php
$value = [['foo'], 'bar'];
$serialized = $value;
$serialized[0] = serialize($serialized[0]);
$serialized = serialize($serialized);
$unserialized = unserialize($serialized);
$unserialized[0] = unserialize($unserialized[0]);
$value == $unserialized // true
@alquerci your example should work and works.
The problem only exists if serialize uses references in the serialized string.
As in my abstract example we use in Collection::data
and Collection::_snapshot
the same object. This will create a reference in the serialized string (r:19;
). But because we serialize/unserialize the array out of order the reference counter gets out of sync.
Here a shorter example to illustrate the problem
<?php
class RelatedModel
{ }
class Model implements Serializable
{
private $doubleSerialized = ['one', 'one', ];
private $obj1 = null;
private $obj2 = null;
public function __construct()
{
$this->obj1 = $this->obj2 = new RelatedModel();
}
public function serialize()
{
$vars = get_object_vars($this);
$vars['doubleSerialized'] = serialize($vars['doubleSerialized']);
return serialize($vars);
}
public function unserialize($serialized)
{
$vars = unserialize($serialized);
foreach ($vars as $k => $v) {
$this->$k = $v;
}
$this->doubleSerialized = unserialize($this->doubleSerialized);
}
}
$bar = new Model();
$str = serialize($bar); // 'C:5:"Model":122:{a:3:{s:16:"doubleSerialized";s:34:"a:2:{i:0;s:3:"one";i:1;s:3:"one";}";s:4:"obj1";O:12:"RelatedModel":0:{}s:4:"obj2";r:7;}}'
$res = unserialize($str); // throws error
Another prerequisite is that you use double serialization in a custom serialize function. Outside of Serializable::serialize()
the reference counter gets reset. That's why this example works:
<?php
class RelatedModel
{ }
$object = new RelatedModel();
$value = [
'foo' => ['one', 'one'],
'obj1' => $object,
'obj2' => $object,
];
$value['foo'] = serialize($value['foo']);
$serialized = serialize($value); // 'a:3:{s:3:"foo";s:34:"a:2:{i:0;s:3:"one";i:1;s:3:"one";}";s:4:"obj1";O:12:"RelatedModel":0:{}s:4:"obj2";r:3;}'
$unserialized = unserialize($serialized);
$unserialized['foo'] = unserialize($unserialized['foo']);
$value === $unserialized; // true
This bug happens only on edge cases. Let me describe the scenario first:
schema.yml:
Important here is that 'Model' contains a column of type 'array' and 'Model' has a 'RelatedModel'.
Now the database content: The database should contain at least one 'Model' (id: 1) connected with one 'RelatedModel' (id: 1). The 'Model'.'details' should contain an array with at least 20 entries.
Now lets provoke the error. I found these two methods:
Method 1: Load form database with cache If this query hits the cache the unserialize will fail.
Method 2: serialize/unserialize with references
both of these examples will create an error similar to this: