doctrine / orm

Doctrine Object Relational Mapper (ORM)
https://www.doctrine-project.org/projects/orm.html
MIT License
9.9k stars 2.5k forks source link

DDC-2332: [UnitOfWork::doPersist()] The spl_object_hash() generate not unique hash! #3037

Closed doctrinebot closed 1 year ago

doctrinebot commented 11 years ago

Jira issue originally created by user fchris82:

I created fixtures and some data was inserted many times without calling the Task entity PrePersist event listener.

I printed the used and generated hash and I saw a Proxies\*_CG_*\Asitly\ProjectManagementBundle\Entity\User hash equal a Task entity hash!

doctrinebot commented 11 years ago

Comment created by @ocramius:

Please provide either a code example or a test case. As it stands, this issue is incomplete

doctrinebot commented 11 years ago

Comment created by @beberlei:

Are you calling EntityManager#clear() inbetween? Because PHP reuses the hashes. The ORM accounts for this.

doctrinebot commented 11 years ago

Comment created by fchris82:

Hi!

http://diginin.hu/spl*object_hash*bug.zip

Chris

doctrinebot commented 11 years ago

Comment created by @beberlei:

This is not a reproduce case, i don't want to execute your whole project.

I want to know, what is the actual bug that you see? Can you just print a list of all the hashes? Because the hashes dont differ at the end, bu tjust somewhere in the middle.

doctrinebot commented 11 years ago

Comment created by fchris82:

I attached a hashlogs.txt file. The last Task class hash is 0000000050ab4aba0000000058e1cb12 ( line 3 129 )

This is not unique, view the line 2 760 . The Task is not being saved and the program don't call the prePersist listener. The "UnitOfWork" believe the entity has been saved because the isset($this->entityStates[$oid]) is true. But it is an other entity.

doctrinebot commented 11 years ago

Comment created by fchris82:

The EntityManager::clear() fix the problem, but this is not "good" and "beautiful" solution. Shows no sign of that conflicts were and this is causing the problem. I was looking for the problem 7 hours.

doctrinebot commented 10 years ago

Comment created by @ocramius:

One possible issue here is that a listener registers an entity as managed while a proxy is being loaded.

The given data is still insufficient to actually verify the problem.

tomaszmadeyski commented 8 years ago

I can confirm having this issue. My case is working on quite big collection of entities, and code (simplified) looks like this:

//$idsArr is about 2k elements
foreach ($idsArr as $id) {
    $entityA = new A($id);
    $em->persist($entityA);
}
$em->flush();

//some code happening in between but everything is happening during single request

//$anotherIdsArr is about 2k items
foreach ($anotherIdsArr as $anotherId) {
    $entityB = new B($someConstValue, $anotherId);
    $em->persis($entityB);
}
$em->flush();

Class A definition:

/**
 * @ORM\Entity()
 * @ORM\Table(name="a")
 */
class A
{
    /**
     * @var integer
     *
     * @ORM\Id
     * @ORM\Column(type="integer")
     */
    protected $l1;

    /**
     * @param integer $l1
     */
    public function __construct($l1)
    {
        $this->l1 = $l1;
    }

} 

Class B definition

/**
 * @ORM\Entity()
 * @ORM\Table(name="b")
 */
class B
{
    /**
     * @var integer
     *
     * @ORM\Id
     * @ORM\Column(type="integer")
     */
    protected $l1;

    /**
     * @var integer
     *
     * @ORM\Id
     * @ORM\Column(type="integer")
     */
    protected $l2;

    /**
     * @param integer $l1
     * @param integer $l2
     */
    public function __construct($l1, $l2)
    {
        $this->l1 = $l1;
        $this->l2 = $l2;
    }

}

While trying to persist B instance $entityB = new B(12, 8995); I get $oid == '0000000013204a0200007fc5ffc03559' and such element already exists in $this->entityIdentifiers so getEntityState method returns STATE_MANAGED so $entityB is not persisted.

$this->entityIdentifiers['0000000013204a0200007fc5ffc03559'] stores instance of A entity: $a = new A(6148)

I can also confirm that calling $em->clear(A::class) before second loop fixes the issue but it's rather a workaround. I think the issue here is that according to doc spl_object_hash can return same hash if object is destroyed and this is what is happening in my case

Ocramius commented 8 years ago

entityIdentifiers keeps a reference of type A, therefore that A wasn't garbage-collected. This means that the issue comes from somewhere else.

As I already stated above, this issue needs a reproducible test case.

tomaszmadeyski commented 8 years ago

the thing is it looks like A was garbage collected - as I stated in comment there's quite a lot happening between these two loops so it might have been garbage collected

thebuccaneersden commented 8 years ago

@Ocramius I believe I have a reproducible test case here ~~> https://github.com/thebuccaneersden/doctrine-bug

Ocramius commented 8 years ago

@thebuccaneersden no, that is not a test case, that is an application that I'd have to set up and run, and that's not going to happen, sorry :-\

Please refer to https://github.com/doctrine/doctrine2/tree/1c2b7c968561f2e319117d7860c88e8c0cad3c7a/tests/Doctrine/Tests/ORM/Functional/Ticket for a list of existing functional test cases.

sroze commented 6 years ago

I just had the same issue: the spl_object_hash of an entity was the same than a previous one (completely different class), the $uow->getEntityIdentifier() was returning a wrong ID (the one of the previous one) and therefore the BasicEntityPersister::prepareUpdateData method was failing.

Calling $em->clear() worked around the bug.

petervf commented 6 years ago

This test triggers the problem consistently for me.

namespace Doctrine\Tests\ORM\Functional;

use Doctrine\Tests\Models\CMS\CmsArticle;
use Doctrine\Tests\Models\CMS\CmsUser;
use Doctrine\Tests\OrmFunctionalTestCase;

class HashCollisionTest extends OrmFunctionalTestCase
{
    protected function setUp()
    {
        $this->useModelSet('cms');
        parent::setUp();
    }

    public function testHashCollision() {
        $user = new CmsUser();
        $user->username = "Test";
        $user->name     = "Test";
        $this->_em->persist($user);
        $this->_em->flush();

        $articles = [];
        for($i=0;$i<100;$i++) {
            $article = new CmsArticle();
            $article->topic = "Test";
            $article->text  = "Test";
            $article->setAuthor($this->_em->merge($user));
            $this->_em->persist($article);
            $this->_em->flush();
            $this->_em->clear();
            $articles [] = $article;
        }

        $user = $this->_em->merge($user);
        foreach($articles as $article) {
            $article = $this->_em->merge($article);
            $article->setAuthor($user);
        }
        unset($article);

        gc_collect_cycles();

        $keep = [];
        for($x=0;$x<1000;$x++) {
            $keep[] = $article = new CmsArticle();

            $article->topic = "Test";
            $article->text  = "Test";
            $article->setAuthor($this->_em->merge($user));

            $this->_em->persist($article);
            $this->_em->flush();
            $this->assertNotNull($article->id, "Article wasn't persisted on iteration $x");
        }
    }
}

The problem is in the loop that merges the articles. Every article->user is its own seperate instance of a proxy object. The spl_object_hash of these objects get added to entityStates/entityIdentifiers in UnitOfWork->registerManaged(). However, no reference to the object is kept here. RegisterManaged calls UnitOfWork->addToIdentityMap(). The identityMap does keep a reference to the object, but only for the first object with the same id. This means that for most of the CmsUser Proxy objects, no reference is kept from within doctrine, so if I break the last reference in my code by calling $article->setAuthor($user), the proxy object may be garbage collected, but UnitOfWork->entityStates and UnitOfWork->entityIdentifiers still contain the spl_object_hash of the now non existent object.

If I now try to create a new CmsArticle, php may use the same memory location, generating the same object hash. In this case that causes doctrine to think it's already managing the entity, and not persisting it.

sandvige commented 5 years ago

We're also reproducing the issue. From our point of view, it appears to be because entities are kept into $identityMap using a strategy that is different than others maps ($entityIdentifiers, $originalEntityData, etc. which are oid based). When an entity is created using the same ids as a previous one, the new entity is tracked by maps oid-based, but it's NOT tracked by the $identityMap (because the check is based on keys, not on oid) . If the previous one is garbage collected (this is eventually happening anytime), the $identityMap will be confused when it comes to find the associated entity, resulting into a problematic skip. We're about to provide a fix, if someone can validate the way we're understanding the issue, this could save us time before writing code. We're about to create something like an IdentityMapManager that will track ALL entities:

something like that in addToIdentityMap(): $this->identityMap[$className][$idHash][$oid] = $entity;

petervf commented 5 years ago

@sandvige We worked around it by implementing an EntityMangerDecorator that keeps a reference to every entity that passes through it; only releasing it when clear() is called. I'd guess that ends up being functionally the same as your suggestion.

Keeping a reference to all entities in this way is far from ideal but it's the best we could do.

An anonimized version of our implementation is:

class OurEntityManager extends EntityManagerDecorator {

    /** @var array */
    private $references = [];

    public function __construct(EntityManagerInterface $wrapped) {
        parent::__construct($wrapped);
    }

    public function clear($objectName = null) {
        if($objectName === null) {
            $this->references = [];
        }

        parent::clear($objectName);
    }

    public function detach($object){
        unset($this->references[spl_object_hash($object)]);

        parent::detach($object);
    }

    public function remove($object) {
        $this->references[spl_object_hash($object)] = $object;
        parent::remove($object);
    }

    public function persist($object) {
        $this->references[spl_object_hash($object)] = $object;
        parent::persist($object);
    }

    public function merge($object) {
        $newObject = parent::merge($object);
        $this->references[spl_object_hash($newObject)] = $newObject;

        foreach($this->getClassProperties($newObject) as $property) {
            $value = $property->getValue($newObject);
            if(is_object($value) && !($value instanceof \DateTime)) {
                $this->references[spl_object_hash($value)] = $value;
            }
        }

        return $newObject;
    }

    private $classPropertiesCache = [];
    private function getClassProperties($entity){
        $class = get_class($entity);

        unset($this->classPropertiesCache[$class]);
        if(!isset($this->classPropertiesCache[$class])) {
            $arr = [];
            $this->__getClassproperties(new \ReflectionClass($entity), $arr);
            $this->classPropertiesCache[$class] = $arr;
        }

        return $this->classPropertiesCache[$class];
    }

    private function __getClassproperties(\ReflectionClass $reflectionClass, &$results) {
        $metaData = null;
        try {
            $metaData = $this->wrapped->getMetadataFactory()->getMetadataFor($reflectionClass->getName());
        }catch(MappingException $mex) {
            // Skip when a subclass is not a valid entity
        }

        if($metaData !== null) {
            foreach($reflectionClass->getProperties() as $property) {
                $propertyName = $property->getName();

                if(isset($metaData->associationMappings[$propertyName])) {
                    if(isset($metaData->associationMappings[$propertyName])) {
                        $property->setAccessible(true);
                        $results[] = $property;
                    }
                }
            }
        }

        if($reflectionClass->getParentClass() !== false) {
            $this->__getClassProperties($reflectionClass->getParentClass(), $results);
        }

        return $results;
    }
}
Ocramius commented 5 years ago

If I understand the problem correctly, we are getting an entity garbage collected while we still have a reference to its OID: does the problem present itself also on master?

sandvige commented 5 years ago

@petervf Thanks for the code! Indeed, this is also a valid approach. @Ocramius As long as we're only keeping one entity here: https://github.com/doctrine/doctrine2/blob/master/lib/Doctrine/ORM/UnitOfWork.php#L1355

By the way, the function returns false when this is happening, but every piece of code using this method is giving approximately 0 sh*ts :D

sandvige commented 5 years ago

@Ocramius I added code using the provided unit tests (this is failing without a fix), and I also added a dumb fix showing unit tests going green. I will work on it in the next day, I just wanted to bump this issue as I'll provide a way to handle the issue by keeping an array that maps entities with oids (the missing part I was explaining a few comments above)

carlospauluk commented 5 years ago

Really bad choice to use spl_object_hash(), since it is not reliable when dealing with too much garbage collection. It just doesnt provide an unique representation of all object, forcing us to have to call $entityManager->clear() all the time.

Ocramius commented 5 years ago

@carlospauluk there is no other concept of object identity in php-src. As long as objects are not GC'd, the identifiers are also not reused, hence the bug must be in a retained identifier in the UoW.

From the tests in https://github.com/doctrine/doctrine2/pull/7407, it seems like the bug is when merging entities, where an object is GC'd, while the identity map is not correctly cleaned.

If you want to help out, please try and debug the scenario to find the exact location of the retained identifier. Regardless if spl_object_hash() or spl_object_id() are used, that's our only way to retain object identity in a map in this programming language.

sandvige commented 5 years ago

@Ocramius & @carlospauluk spl_object_hash() and spl_object_id() are ok if we manage to keep a reference to the associated objects (which is exactly what we do in the provided fix, with the oidMap attribute). If the bug is correctly understood, why don't we merge it? Because of the tests? We cannot provide an exact test, because the GC is running across ALL the running tests, making it impossible to predict its behavior. This is why the provided unit test is trying to "saturate" the memory to maximize our chance for the GC to run.

Ocramius commented 5 years ago

why don't we merge it? Because of the tests?

Precisely. No test => no regression verification => no way to check if we don't break this accidentally further down the line.

Let me be very clear that i do not merge nor approve patches that do not have automated verification. There are exclusions to this, but this isn't the case.

the GC is running across ALL the running tests

Then let's use @runInSeparateProcess, or a separate CLI script that is only there to verify this specific scenario, but it is required to run in CI (with low -dmemory_limit=).

No reliable automation => no merge.

carlospauluk commented 5 years ago

If I'm getting it right, we can see this divergence when we look at the functions isInIdentityMap () and getEntityState () in the UnitOfWork.php. I'm really beginning with the doctrine internals code, but I see that getEntityState () relies only on the $ oid = spl_object_hash ($ entity) to check the state of an entity, while isInIdentityMap () relies on the spl_object_hash plus the "$ classMetadata-> rootEntityName". This is exactly the problem that I'm getting here with my code. Indeed @Ocramius, after I made a bunch of merges with detached entities, someway the GC seems to rip off the references just after the finish of my "mergeAll" function, and then when I try to persist a fresh new entity (a few functions later in the same thread), the getEntityState () gets "confused" and returns that this new entity is already persisted and does not schedule the INSERT.

Why do not use the double check (rootEntityName + spl_object_hash) to ensure that we're talking about the same object instead of only using the weak spl_object_hash () logic?

sandvige commented 5 years ago

@Ocramius You're right, you're the man in charge of quality, you're doing it right. But I'm providing tests. What is wrong with the tests I'm providing except the resources its using? Finally, your last comment is just saying "everything is good". I was just arguing why we need to use 1000+ entities to reproduce, and why we cannot reproduce with only 2. What is wrong?

Ocramius commented 5 years ago

@sandvige https://github.com/doctrine/orm/pull/7407/files#discussion_r220486572

sandvige commented 5 years ago

@Ocramius Oh! Ok, I will remove this fix from this PR, and add it to another with the "Missing Tests" flag. Thanks for your input

sandvige commented 5 years ago

@Ocramius I updated the PR :)

codler commented 4 years ago

I hope this will be fixed soon!

NicklasWallgren commented 4 years ago

My temporary fix until this issue has been resolved.

interface UniqueIdentifier
{

    /**
     * Returns a unique entity id.
     *
     * @return string
     */
    public function uniqueId(): string;

}
    /**
     * Calculates a unique id.
     *
     * @param mixed $entity
     * @return string
     */
    public static function calculateUniqueId($entity)
    {
        if ($entity instanceof UniqueIdentifier) {
            return $entity->uniqueId();
        }

        // Risk of collisions
        return \spl_object_hash($entity);
    }


abstract class Entity implements UniqueIdentifier {

    /**
     * @var string
     */
    private $uniqueId;

    /**
     * @inheritDoc
     */
    public function uniqueId(): string
    {
        if ($this->uniqueId === null) {
            $this->uniqueId = uniqid();
        }

        return $this->uniqueId;
    }

}
Matt-PMCT commented 3 years ago

Got bit by this today, couldn't figure out why an entity wasn't persisting/flushing, hash collision...

Hikariii commented 3 years ago

As a solution a weakmap can be used (php 8). See https://wiki.php.net/rfc/weak_maps for more details and how it relates to spl_object_hash.

bigfoot90 commented 3 years ago

I'm also getting this bug. Are there any progress on this?

greg0ire commented 3 years ago

Maybe… in https://github.com/doctrine/orm/pull/8837, I migrated Doctrine to the new spl_object_id(), which might not have any collisions. It's not released yet, but maybe you can try?

NicklasWallgren commented 3 years ago

@greg0ire The same issue applies to spl_object_id() as far as I understand.

"The object id is unique for the lifetime of the object. Once the object is destroyed, its id may be reused for other objects". https://www.php.net/manual/en/function.spl-object-id.php

bigfoot90 commented 3 years ago

I've tried branch 2.10.x too and it's not working

greg0ire commented 3 years ago

Oh right, that's unfortunate! Please spl_object_id() is already mentioned above, I need to re-read the whole thread to fully understand where we're at.

Matt-PMCT commented 1 year ago

It seems tests are written, a pull is made at #8994 , is there anything an average user can do to push this issue forward? I'm still encountering random objects in my databases not being saved because they get eaten by the hash collisions...

greg0ire commented 1 year ago

@Matt-PMCT if you are using PHP 8, maybe you could experiment with weak maps, as suggested in https://github.com/doctrine/orm/issues/3037#issuecomment-726360458 ?

I think switching to those would fix the bug without requiring a test for the bug (which is the part that seems to be the issue), because switching to weak maps can also be seen as a performance improvement: if the object has been garbage collected, there is no need to keep its identifier in memory, right? If it does fix the issue, then we can:

Matt-PMCT commented 1 year ago

@greg0ire I got a chance to look at this today and looked at the comment you linked. Unfortunately, the comment doesn't provide enough information for me to do any sort of educated implementation. spl_object_id is used 59 times in the UnitOfWork file, and after reading the PHP RFC and the final documentation (https://www.php.net/manual/en/class.weakmap.php) I do not fully understand how this would be implemented.

My first attempt was to transition just one of the arrays to a WeakMap, so I tried modify the UnitOfWork such that the entityInsertions array was converted to use a WeakMap. But what I ran into was that there are several functions that do things like array merges, diffs, and such, so I could not just convert the array to a Weak Map.

I think I am not understanding how it is envisioned that WeakMaps may be used here (also admittedly this is my first ever foray into them). I'm still willing to try, but I need a little guidance. Even the roughest description of how someone would envision this occurring.

My current project has a deployment script that can reliably trigger this hash collision, but it is not easily distilled into a unit test since it is a conglomeration of many different entities spanning my entire project.

greg0ire commented 1 year ago

@Matt-PMCT thanks for looking into this!

so I tried modify the UnitOfWork such that the entityInsertions array was converted to use a WeakMap

Great! I trust you used true or maybe even null as the value, and the entities as the key right?

But what I ran into was that there are several functions that do things like array merges, diffs, and such, so I could not just convert the array to a Weak Map.

:thinking: any specific example? I mean if I had to merge 2 weak maps, I would iterate over one of them and add every element it has to the other, so I don't think array_merge would be a blocker. For a diff, it's more complicated I suppose, but hopefully it's feasible. It would be helpful if you could use github range permalinks to point us to the offending code .

I think I am not understanding how it is envisioned that WeakMaps may be used here (also admittedly this is my first ever foray into them). I'm still willing to try, but I need a little guidance. Even the roughest description of how someone would envision this occurring.

The point of weakmaps is that you can use objects as keys directly. So to answer your question, we would use them as a replacement for all arrays currently keyed with spl_object_id(). You seem to have understood this already so I think I must have misunderstood your question.

mpdude commented 1 year ago

@Matt-PMCT are you still able to run the deployment script that triggers the issue?

  1. Does it make use of detach/merge for entities?
  2. Could you run it with the code from #10785 and report if that throws an exception somewhere?
mpdude commented 1 year ago

@tomaszmadeyski did not provide enough code to reproduce in https://github.com/doctrine/orm/issues/3037#issuecomment-180062249. It might be that they're using duplicate application-provided IDs, causing this problem down the road.

@sandvige in https://github.com/doctrine/orm/issues/3037#issuecomment-419038351 said:

When an entity is created using the same ids as a previous one, the new entity is tracked by maps oid-based, but it's NOT tracked by the $identityMap (because the check is based on keys, not on oid)

IMHO, that is not valid – you must not create entities with the same ID as another entity used previously. The EM/UoW should reject that, at least as long as it still knows about the previous instance. (Otherwise, it will probably fail when trying to insert into the database.)

10785 is a WIP that will reject entity instances using the same ID as another instance already in memory.

This leaves us with https://github.com/doctrine/orm/issues/3037#issuecomment-369211489 as the only piece of reproduce code provided, and that code also ended up in #7404 and #8994. It still fails on 2.15. I think that's a bug in merge(), which is creating proxy instances without checking the identity map first. I'll see if I can follow up on that.

@Matt-PMCT might provide another lead/edge case

flack commented 1 year ago

I don't know if it helps, but I did some digging in some tests in my unittest suite which trigger this problem. The code is a bit convoluted, but what happens is more or less this:

1: Entity A persisted & detached 2: Entity B which has an association to A persisted, detached, merged and deleted 3: Entity C which has an association to A persisted, detached, merged and deleted 4: Try to persist a new entity D ==> spl_object_id collision. The new object (created with new, no identifier set) gets the same spl_object_id as the association proxy for the A object in step 3

I don't know why both step 2 and 3 are necessary, but if I remove either of them, the problem doesn't occur. The proxy which uses the same spl_object_id as the object in step 4 is created here:

https://github.com/doctrine/orm/blob/2.15.x/lib/Doctrine/ORM/UnitOfWork.php#L3675

before the call to registerManaged the spl_object_id of $other is not present in UnitOfWork internal variables. Afterwards it appears in three places:

flack commented 1 year ago

If I apply this patch:

diff --git a/lib/Doctrine/ORM/UnitOfWork.php b/lib/Doctrine/ORM/UnitOfWork.php
index ebb5f12a0..42a376b00 100644
--- a/lib/Doctrine/ORM/UnitOfWork.php
+++ b/lib/Doctrine/ORM/UnitOfWork.php
@@ -3672,11 +3672,7 @@ class UnitOfWork implements PropertyChangedListener
                                 if ($targetClass->subClasses) {
                                     $other = $this->em->find($targetClass->name, $relatedId);
                                 } else {
-                                    $other = $this->em->getProxyFactory()->getProxy(
-                                        $assoc2['targetEntity'],
-                                        $relatedId
-                                    );
-                                    $this->registerManaged($other, $relatedId, []);
+                                    $other = $this->em->getReference($targetClass->name, $relatedId);
                                 }
                             }

The amount of collisions in my test suite goes down from 27 to 1 (and memory usage increases from 100.5 MB to 102.5 MB). All my tests pass, but I haven't tried running the doctrine testsuite with the change.

mpdude commented 1 year ago

@flack please try the change from #10791

mpdude commented 1 year ago

@flack regarding this...

1: Entity A persisted & detached 2: Entity B which has an association to A persisted, detached, merged and deleted 3: Entity C which has an association to A persisted, detached, merged and deleted 4: Try to persist a new entity D ==> spl_object_id collision. The new object (created with new, no identifier set) gets the same spl_object_id as the association proxy for the A object in step 3

In order to get the ID collisions, you (+ the code you're passing through) needs to create the right amount of objects and/or trigger GC.

But, from a more qualitative point of view, what is important is that during all those operations,

So, from your description, I'd say 1) + 2) should suffice to get the inconsistency somewhere along the way.

3) seems to just repeat 2), and 4) is what causes the final collision. But I'd suspect the inconsistency happens with 1) or 2) already.

Do you think you could write a test similar to #10791?

flack commented 1 year ago

@flack please try the change from #10791

thanks, I gave it a try and it has (in my setup) the same effect as the patch I posted above, i.e. collisions go down from 27 to 1 and memory usage rises slightly. All tests pass, so that's good! :-)

I also noticed that the change fixes the problem I described here: https://github.com/doctrine/orm/pull/10785#issuecomment-1604098742

mpdude commented 1 year ago

So let's hunt your last one collision :-)?

Can you try do make consistency checks as described in https://github.com/doctrine/orm/issues/3037#issuecomment-1604698758?

flack commented 1 year ago

Do you think you could write a test similar to #10791?

I mean I can try, but it seems #10791 fixes this particular issue, would it still add value in that case?