doctrine / dbal

Doctrine Database Abstraction Layer
https://www.doctrine-project.org/projects/dbal.html
MIT License
9.48k stars 1.34k forks source link

Binary entity identifier resource not reset, fails to populate second collection, PDOStatement #5895

Open mcurland opened 1 year ago

mcurland commented 1 year ago

Bug Report

If an entity has a binary identifier field then the id value is managed as a PHP resource. Of course, a resource is only meaningful if it has a reasonable current position. Specifically, if the value is read more than once then the stream position needs to be restored in between reads.

In my case (Postgres or Oracle over PDO) PDOStatement->execute moves the resource pointer, but this likely fails for other drivers as well. The bottom line is that a resource is not quite an immutable value, so the position needs to be restored between reads to get consistent data.

1) When the entity is populated the id resource is at position 0. 2) When the first lazy load collection (likely eager as well, I just haven't tested) the correct id is produced, so the sql log (postgres, but this is not a postgres-specific issue) looks something like $1='\xabcdef'. 3) When the second lazy load collection is queried the sql log shows $1='\x'. This should be the same as the first value. It obviously never returns data.

Clearly, the resource is being used and not rewound to its original position, so the second use fails. There is absolutely nothing fancy here, I'm just retrieving an entity with two one-to-many lazy load collections. However, similar behavior is seen with a write operation, where the first join succeeds and the second shows a foreign-key violation on the empty binary string data.

I tracked the resource pointer movement to the PDOStatement->execute call in doctrine\dbal\src\Driver\PDO\statement.php. Looking further into the PHP sources (https://github.com/php/php-src/blob/master/ext/pdo/pdo_stmt.c) shows

            case PDO_PARAM_LOB:
                if (Z_TYPE_P(dest) == IS_STRING) {
                    php_stream *stream =
                        php_stream_memory_open(TEMP_STREAM_READONLY, Z_STR_P(dest));
                    zval_ptr_dtor_str(dest);
                    php_stream_to_zval(stream, dest);
                }
                break;

Which reads the stream but does not restore it. So, short of changing the PDOStatement behavior in PHP, the resources either need to be copied before the call (during parameter binding) or reset to their prior position (usually 0, but not guaranteed) after the call. There is little point in copying resources since the copy also moves the original pointer, which leads to a solution of simply restoring the position after the execute call.

I'll make a pull request with the suggested changes to PDO\statement.php, but I'm not claiming this is the only driver that experiences this issue. The collaborator team will need to decide if this is a dbal issue or needs to be handled higher in the stack (BasicEntityPersister, UnitOfWork->loadCollection, etc). There are obviously a ton of both read and write scenarios that funnel down into PDOStatement->execute, so fixing it higher up the stack would be significantly more code (and might be unnecessary for statement implementations associated with other drivers, TBD).

I'm currently using the 2.8 bundle installed with Symfony.

Summary

Binary identifier in entity can be used once as a parameter to PDOStatement->execute but fails on all subsequent operations.

Current behaviour

How to reproduce

  1. Define an entity with a binary identifier column
  2. Define second and third entities that references the first (OneToMany is sufficient, ManyToMany will fail as well).
  3. Add some data to the db (it won't work through Doctrine until this is fixed, so do it manually).
  4. Fetch the first entity and call the two lazy load collections==>The first collection populates, the second does not.

Expected behaviour

This should work the same as any other identifier and allow multiple internal uses.

derrabus commented 1 year ago

I'll make a pull request with the suggested changes to PDO\statement.php, but I'm not claiming this is the only driver that experiences this issue.

If you include a functional test that reproduces your problem, we shall see what other drivers are affected.

Anyway, I'm unsure if this is really a bug in the DBAL or if it's the ORM that uses the DBAL in a wrong way. I don't think those resources produced by the DBAL for the binary field are meant to be read more than once.

mcurland commented 1 year ago

Hi Alexander, thank you for the quick response.

I think it's clear that the ORM is assuming the data passed to the DBAL layer is immutable. Even the parameter names ($value for a resource) indicate the expectation of an immutable value. Apart from initializing a resource (including the rewind when the data is in) the ORM layer doesn't have any handling for individual data types. So, I'm not sure that this is truly a DBAL bug either, but asking for the ORM layer to switch from treating column values as inherently immutable is a huge ask and will produce a major code change.

Basically, by propagating binary data as a resource (instead of, for example, a byte string or byte array) you're in a situation where reading the data modifies it. I don't necessarily disagree with this choice (stream data can be large, so you don't want to copy it arbitrarily into a less stateful structure), but it clearly causes problems. Any time I read it the line before return is a call to rewind.

For my codebase, the DDL, Doctrine classes, and an I/O layer to read client data (change or query) and translate it to the O/RM requests is all generated. There is no custom interaction with the Doctrine entities and the queries and changes are JSON-driven, so the idea that I can't retrieve the population of two collections with one request is really bad (like target another framework bad). In this case, the generated I/O layer can be re-created (through a settings file) to handle UUID values as native/character/binary. The settings are meant to match the expected back end uuid support. All I'm doing is swapping out the type of the identifier. It shouldn't break the code.

I wrote a stripped-down model (pasted below) and generated some basic entity classes for you to work with, but having burned several days on this already I just don't have the bandwidth to spend any more getting the full repository building and testable. Hopefully this helps and will let you generate and run a db to any driver.

<?php
namespace App\Test\BinRef;
use Doctrine\Common\Collections\ArrayCollection; {
    /**
    * @Entity
    * @Table(name="FirstType", uniqueConstraints={@UniqueConstraint(name="FirstType_PK", columns={"firstTypeId"})})
    */
    class FirstType {
        /**
        * @Column(name="firstTypeId", type="integer", nullable=false)
        * @Id
        * @GeneratedValue
        */
        protected $firstTypeId;
        /**
        * @Column(name="data", type="string", length=48, nullable=true)
        */
        protected $data;
        /**
        * @ManyToOne(targetEntity="TargetType", inversedBy="firstTypeCollection")
        * @JoinColumn(name="targetTypeUuid", referencedColumnName="targetTypeUuid")
        */
        protected $targetType;
        public function getFirstTypeId() {
            return $this->firstTypeId;
        }
        public function setFirstTypeId($value) {
            $this->firstTypeId = $value;
        }
        public function getData() {
            return $this->data;
        }
        public function setData($value) {
            $this->data = $value;
        }
        public function getTargetType() {
            return $this->targetType;
        }
        public function setTargetType($value) {
            $current = &$this->targetType;
            if ($current !== $value) {
                if ($current !== null) {
                    $current->_syncFirstTypeCollection($this, true);
                }
                if ($value !== null) {
                    $value->_syncFirstTypeCollection($this, false);
                }
            }
            unset($current);
            $this->targetType = $value;
        }
        public function __toString() {
            return spl_object_hash($this);
        }
    }
    /**
    * @Entity
    * @Table(name="SecondType", uniqueConstraints={@UniqueConstraint(name="SecondType_PK", columns={"secondTypeId"})})
    */
    class SecondType {
        /**
        * @Column(name="secondTypeId", type="integer", nullable=false)
        * @Id
        * @GeneratedValue
        */
        protected $secondTypeId;
        /**
        * @Column(name="data", type="string", length=48, nullable=true)
        */
        protected $data;
        /**
        * @ManyToOne(targetEntity="TargetType", inversedBy="secondTypeCollection")
        * @JoinColumn(name="targetTypeUuid", referencedColumnName="targetTypeUuid")
        */
        protected $targetType;
        public function getSecondTypeId() {
            return $this->secondTypeId;
        }
        public function setSecondTypeId($value) {
            $this->secondTypeId = $value;
        }
        public function getData() {
            return $this->data;
        }
        public function setData($value) {
            $this->data = $value;
        }
        public function getTargetType() {
            return $this->targetType;
        }
        public function setTargetType($value) {
            $current = &$this->targetType;
            if ($current !== $value) {
                if ($current !== null) {
                    $current->_syncSecondTypeCollection($this, true);
                }
                if ($value !== null) {
                    $value->_syncSecondTypeCollection($this, false);
                }
            }
            unset($current);
            $this->targetType = $value;
        }
        public function __toString() {
            return spl_object_hash($this);
        }
    }
    /**
    * @Entity
    * @Table(name="TargetType", uniqueConstraints={@UniqueConstraint(name="TargetType_PK", columns={"targetTypeUuid"})})
    */
    class TargetType {
        /**
        * @Column(name="targetTypeUuid", type="binary", length=16, nullable=false)
        * @Id
        */
        protected $targetTypeUuid;
        /**
        * @Column(name="targetTypeName", type="string", length=48, nullable=true)
        */
        protected $name;
        /**
        * @OneToMany(targetEntity="FirstType", mappedBy="targetType")
        */
        protected $firstTypeCollection;
        /**
        * @OneToMany(targetEntity="SecondType", mappedBy="targetType")
        */
        protected $secondTypeCollection;
        public function getTargetTypeUuid() {
            return $this->targetTypeUuid;
        }
        public function setTargetTypeUuid($value) {
            $this->targetTypeUuid = $value;
        }
        public function getName() {
            return $this->name;
        }
        public function setName($value) {
            $this->name = $value;
        }
        public function addToFirstTypeCollection($value) {
            $value->setTargetType($this);
        }
        public function removeFromFirstTypeCollection($value) {
            $value->setTargetType(null);
        }
        public function getFirstTypeCollection() {
            return $this->ensureFirstTypeCollection()->toArray();
        }
        public function _syncFirstTypeCollection($value, $remove) {
            $this->ensureFirstTypeCollection()->{$remove ? 'removeElement' : 'add'}($value);
        }
        public function addToSecondTypeCollection($value) {
            $value->setTargetType($this);
        }
        public function removeFromSecondTypeCollection($value) {
            $value->setTargetType(null);
        }
        public function getSecondTypeCollection() {
            return $this->ensureSecondTypeCollection()->toArray();
        }
        public function _syncSecondTypeCollection($value, $remove) {
            $this->ensureSecondTypeCollection()->{$remove ? 'removeElement' : 'add'}($value);
        }
        private function ensureFirstTypeCollection() {
            if (null === ($coll = $this->firstTypeCollection)) {
                $this->firstTypeCollection = $coll = new ArrayCollection();
            }
            return $coll;
        }
        private function ensureSecondTypeCollection() {
            if (null === ($coll = $this->secondTypeCollection)) {
                $this->secondTypeCollection = $coll = new ArrayCollection();
            }
            return $coll;
        }
        public function __toString() {
            return spl_object_hash($this);
        }
    }
}
?>

If you don't have a UUID creation function handy this set will do (obviously put in a class, expected string format is like 'af8034e4-c954-40a6-b02f-4b224364c3a2'):

        private static function createUuid() {
            $data = openssl_random_pseudo_bytes(16);
            // Set version to 0100
            $data[6] = chr(ord($data[6]) & 0x0f | 0x40);
            // Set bits 6-7 to 100
            $data[8] = chr(ord($data[8]) & 0x3f | 0x80);
            return self::resourceFromByteString($data);
        }
        private static function uuidResourceFromString($uuidString) {
            return $uuidString ? self::resourceFromByteString(hex2bin(str_replace('-', '', $uuidString))) : null;
        }
        private static function resourceFromByteString($byteString) {
            $res = fopen('php://temp', 'rb+');
            fwrite($res, $byteString);
            rewind($res);
            return $res;
        }
        private static function uuidStringFromResource($uuidResource) {
            if ($uuidResource) {
                $bytes = stream_get_contents($uuidResource);
                rewind($uuidResource);
                return self::uuidStringFromByteString($bytes);
            }
            return null;
        }
        private static function uuidStringFromByteString($bytes) {
            if ($bytes && strlen($bytes) === 16) {
                $hexBytes = bin2hex($bytes);
                return substr($hexBytes, 0, 8) . '-' . substr($hexBytes, 8, 4) . '-' . substr($hexBytes, 12, 4) . '-' . substr($hexBytes, 16, 4) . '-' . substr($hexBytes, 20, 12);
            }
            return null;
        }
  1. Use the doctrine classes to generate a db to any driver.
  2. Populate the db with data (one row in each of the 3 tables will do, with FirstType and SecondType referencing Target). The extra 'data' and 'name' fields are just to make it easier to distinguish items.
  3. Attach an entity manager to the db through a pdo connection.
  4. Get a TargetType entity with $em->getRepository('App\\Test\\BinRef\\TargetType')->findOneBy(['targetTypeUuid' => self::uuidResourceFromString($id)]);
  5. Now call getFirstTypeCollection on the returned entity to trigger the first SQL call (successful) and getSecondTypeCollection to trigger the second (not succesful).

You'll also see failures persisting if you create the objects and try to persist them--you'll get an FK violation for the second serialization ref to the same TargetType entity.

cbichis commented 1 year ago

Havent tested yet your commit but I am having a similar issue with the secondary entities.