doctrine / phpcr-odm

Doctrine PHPCR ODM
http://www.doctrine-project.org/projects/phpcr-odm.html
MIT License
182 stars 96 forks source link

Documents created by an EventSubscriber are not persisted #532

Open asiragusa opened 10 years ago

asiragusa commented 10 years ago

In my EventSubscriber I am creating a new Document on the preUpdate or prePersist events.

    public function prePersist(LifecycleEventArgs $args) {
        $doc = new Document();
        // initialization here ...
        $args->getObjectManager()->persist($doc);
    }

This Document is not persisted as the changesets are calculated before the event is fired. Moreover the inserts are flushed before that the update event is fired, so there isn't a simple solution, as all the pre- events should be fired during the changesets calculation.

Solving this issue obviously involves an important refactoring of the UnifOfWork and maybe can cause some BC breaks.

A temporary fix for me can be to listen for the preFlush event (that is fired before the changeset calculation), but this means that I will have to manually calculate the changesets and look for the instance of the Document I want to observe (see #531 ). What I don't like about this is that this EventListener will be fired for every flush operation of the DocumentManager and it can slow down the entire application.

What do you think about?

asiragusa commented 10 years ago

Actually I found a better solution for my problem: during the pre- events I schedule the operations to do and I execute them on the endFlush event.

BTW this is still not optimal.

dantleech commented 10 years ago

Thats what I was going to suggest. Doing this also avoids some edge case issues [1], and so is safer (and easier) than recalculating the changeset, at the cost of transactionality (i.e. if for some reason your endFlush fails, the initial flush is obviously already done - but the seriousness of that depends on the use case).

I personally believe the cost of a method call to the event listener for each persisted document is negligible, but would be good to see some benchmarks.

[1] https://github.com/doctrine/doctrine2/pull/1022

asiragusa commented 10 years ago

At this point the cost is negligible, but if the second flush fails my object will be in a dirty state.

I can start a new transaction before the flush and end it right after, but I will need to do that every time I flush my objects, and once again it's a pain.

The only real solution is to refactor the UnitOfWork.

asiragusa commented 10 years ago

Or I could rename the flush method to unsafeFlush, the new flush would start a transaction, call unsafeFlush and terminate the transaction.

In my EventListener I would call then the unsafeFlush method.

dbu commented 10 years ago

are you using 1.2.0-RC1? @lsmith77 did a refactoring of the changeset calculation to clean up things. what you report sounds like a regression. does the problem occur with 1.1 as well?

asiragusa commented 10 years ago

I'm working with the master branch, as the features I PR are only there ATM. The source of the problem is not due to the cleanup in the changeset calculation, but to a simple thing:

The changeset is calculated here for the first time: https://github.com/doctrine/phpcr-odm/blob/d81b6c2e7411b56e7098e7644882fd2cc46d71f3/lib/Doctrine/ODM/PHPCR/UnitOfWork.php#L2021 Here the inserts are done: https://github.com/doctrine/phpcr-odm/blob/d81b6c2e7411b56e7098e7644882fd2cc46d71f3/lib/Doctrine/ODM/PHPCR/UnitOfWork.php#L2061 And here we call the update listener that wants to do an update, but it's too late, as the inserts have already been done.

Moreover the changeset is only calculated for the updated document, so any insert would be missed in any case.

I have experimented with a temporary fix that throws an exception if the changeset has to be recalculated. That does a rollback, begins with a new transaction, recalculates the changeset and tries once again to write down things to the db layer. This obviously ugly and hasn't worked because the internal state of the session has been already compromised.

asiragusa commented 10 years ago

This is the EventListener I have written:


class SectionConfig implements EventSubscriber
{
    protected $entities = array();
    protected $transactions = 0;

    public function getSubscribedEvents()
    {
        return array(
            'preFlush',
            'prePersist',
            'preUpdate',
            'preRemove',
            'endFlush'
        );
    }

    protected function updateRoute(LifecycleEventArgs $args) {
        $entity = $args->getEntity();
        if (! $entity instanceof Section || ! $entity->getLeaf() || ! $entity->getPage()) {
            return;
        }

        $route = $entity->getRoute();
        if($route && $entity->getRoutePath() == $route->getId()) {
            return;
        }
        $this->entities[] = $entity;
    }

    public function prePersist(LifecycleEventArgs $args) {
        $this->updateRoute($args);
    }

    public function preUpdate(LifecycleEventArgs $args) {
        $this->updateRoute($args);
    }

    public function preRemove(LifecycleEventArgs $args) {
        $entity = $args->getEntity();
        if (! $entity instanceof Section || ! $entity->getLeaf()) {
            return;
        }       
        if($old = $entity->getRoute()) {
            $entityManager = $args->getObjectManager();
            if(count($old->getChildren()) <= 0) {
                $entityManager->remove($old);
            }
        }
    }

    public function preFlush(ManagerEventArgs $args) {
        if($this->transactions++ > 0) {
            return;
        }
        $entityManager = $args->getObjectManager();
        $workspace = $entityManager->getPhpcrSession()->getWorkspace();
        $utx = $workspace->getTransactionManager();
        $utx->begin();
    }

    public function endFlush(ManagerEventArgs $args) {
        $entityManager = $args->getObjectManager();
        $needFlush = false;
        $oldRoutes = array();
        foreach ($this->entities as $entity) {
            $needFlush = true;
            if($old = $entity->getRoute()) {
                // This should be something different, but it works well
                $old->setContent($old);
                $oldRoutes[] = $old;
            }

            $exploded = explode('/', $entity->getRoutePath());
            $name = array_pop($exploded);
            $parentPath = implode('/', $exploded);

            $parent = $entityManager->find(null, $parentPath);
            if(! $parent) {
                $session = $entityManager->getPhpcrSession();
                NodeHelper::createPath($session, $parentPath);
                $parent = $entityManager->find(null, $parentPath);
            }

            $route = $entityManager->find(null, $parentPath . "/" . $name);
            if(! $route) {
                $route = new Route(array('add_format_pattern' => true));
                $route->setPosition($parent, $name);
                $route->setRequirement('_format', 'html|json');
            }
            $route->setDefault('type', 'default_type');
            $route->setContent($entity->getPage());

            $entityManager->persist($route);

            $entity->setRoute($route);
        }
        $workspace = $entityManager->getPhpcrSession()->getWorkspace();
        $utx = $workspace->getTransactionManager();
        if($needFlush) {
            try {
                $this->entities = array();
// -> First flush
                $entityManager->flush();
                foreach($oldRoutes as $old) {
                    $old = $entityManager->find(null, $old->getId());

// -> If a transaction is started, the fetched object has a child count of 0, like before the update
                    if(count($old->getChildren()) <= 0) {
                        $entityManager->remove($old);
                    }
                }
                $entityManager->flush();
//              throw new \Exception('test');
            } catch (\Exception $e) {
                $utx->rollback();
                throw $e;
            }
        }
        if(-- $this->transactions == 0) {
            $utx->commit();
        }
    }
}

To solve the transactional integrity I start a transaction in the preflush and I commit that in the endFlush. This works if I am creating a route, let's say with the path /cms/routes. Now, if I update the path to /cms/routes/a I need to know if /cms/routes has a children and remove it only if the child count is zero.

The only way I found to get the correct child count is after the first flush but, worse than that, if I have started a transaction the child count is still wrong. How can I solve this problem?

BTW I could not care of old routes, as I already trap the fact that it's an old route in my controller, but I would like to avoid that the DB grows without limit.

In any case my first priority is the transactional integrity.

lsmith77 commented 10 years ago

btw .. does the ORM support this?

asiragusa commented 10 years ago

I don't know, but in any case it's a big conceptual mistake. The events must be fired before we begin to persist the documents into the DB.

lsmith77 commented 10 years ago

from what I can see the ORM behaves just like the ODM does.: https://github.com/doctrine/doctrine2/blob/master/lib/Doctrine/ORM/UnitOfWork.php#L302

this might be done for performance reasons, ie. to reduce the number of iterations one needs to make over the change set. this is similar to why the changesets are updated before the writes are done, which was the cause for your previous issues with event listeners.

I guess alternatively one could try and make certain actions during a flush immediate. ie. if a persist is done during a flush, then it should trigger the changeset computation and insert immediately.

but imho this is all quite non trivial and IIRC for RoutingAutoBundle we ran into similar limitations. in the end the "solution" was just to make multiple flushes per request work properly. a proper fix imho requires a concerted effort across all Doctrine mappers to define an algorithm for all of this. IIRC @Ocramius was working on a generic lib to become the basis for a unit of work for the next major release of the ORM.

dbu commented 10 years ago

i think we should aim for the same behaviour as the orm. if the orm does not do this, its a good reason for me to not do it here either. could it be that in orm you need to trigger some sort of recalculate changeset when you add things?

Ocramius commented 10 years ago

IIRC @Ocramius was working on a generic lib to become the basis for a unit of work for the next major release of the ORM.

Still only a hope for it until I find time to work on it :-\

asiragusa commented 10 years ago

The only solution that is simple and can work is to do the following:

  1. Calculate the changesets
  2. Fire the event handlers
  3. Calculate once again the changesets
  4. Persist the changes

Once I get notified of the update, the inserts are already done and there is no way to add new inserts inline. Moreover it makes complicated to do something that should be easy.

In my proposal the second changeset calculation can be avoided if no event handler is registered.

We may improve this further limiting the changeset recalculations to the object that have fired a pre- event. If the EventListener implements a let's say NeedChangesetRecalculationInterface, the whole changeset must be recalculated.

As you might see the computational time is exactly the same as today and there is no gain doing something wrong.

lsmith77 commented 10 years ago

But what if the 2nd changeset calculation triggers yet more inserts/updates?

asiragusa commented 10 years ago

It would be the normal behaviour, it is why we calculate it twice.

BTW the originalDatadon't have to be changed by the first changeset calculation

asiragusa commented 10 years ago

Testing once again my EventListener I found out that if I begin a transaction on the preFlush event and I commit it after the last flush the data is not persisted correctly into the DB. More precisely the first flush is not working, only the second. If I disable the transactions everything works again.

Notice that only one transaction is started and the code I have written is equivalent to do the following in the controller:

$utx->begin();
try {
$dm->flush();
} catch(Exception $e) {
$utx->rollback();
}
$utx->commit();

Any ideas? Any help would be appreciated...

asiragusa commented 10 years ago

@lsmith77 @dbu @dantleech How can I do if even the transactions don't work properly? I can't understand why all this is not working :/

lsmith77 commented 10 years ago

indeed .. multiple flushes mean you cannot do it all in one transaction. but what you could do is open the transaction on the Doctrine DBAL connection manually and then commit it manually too.. this will of course not work with other PHPCR implementations. Jackrabbit currently does not support transactions via PHPCR anyway ..

asiragusa commented 10 years ago

ouch so we can only solve the main problem...

dbu commented 10 years ago

@Ocramius but in orm, can you trigger a recalculate of the changeset to add documents from a doctrine event listener?

Ocramius commented 10 years ago

@dbu yes, by having a reference to the UoW

lsmith77 commented 10 years ago

via https://github.com/doctrine/doctrine2/blob/master/lib/Doctrine/ORM/UnitOfWork.php#L903 ?

Ocramius commented 10 years ago

Yeap. absolute uri

asiragusa commented 10 years ago

The problem is that when the preUpdate Event has been fired, the inserts have been already done and the changeset can't be recalculated anymore

Ocramius commented 10 years ago

A changeset only exists on updated objects (in ORM). Inserted values have no changeset...

asiragusa commented 10 years ago

Ok, the problem is that is during the preUpdate I create a new Document, this one is not persisted as the inserts have already been done and the operation fails. The only solution I have found to solve this is to delay the inserts and the update once the first flush has ended. Here is the code I had to write: https://github.com/togucms/ApplicationModelsBundle/blob/master/EventListener/SectionConfig.php

This can't be either an atomic write, as there are two flush operations. Even if this works, I am not happy at all of this code...

The point is firing the preUpdate events before any data is written into the db.

lsmith77 commented 10 years ago

@asiragusa if you can make it work in a minimal invasive way that does not affect performance too much, I am willing to consider it. maybe all it takes is to iterate over all scheduled changes once to trigger the pre* events.

asiragusa commented 10 years ago

One idea may be to insert here https://github.com/doctrine/phpcr-odm/blob/7f9a9efd867928017052b1c1a12d5b159feabe32/lib/Doctrine/ODM/PHPCR/UnitOfWork.php#L2051 something like:

$this->fireEvents();
$this->computeChangeSets();
$this->fireEvents();

And remove https://github.com/doctrine/phpcr-odm/blob/7f9a9efd867928017052b1c1a12d5b159feabe32/lib/Doctrine/ODM/PHPCR/UnitOfWork.php#L2323

The second time we fire the events we may not fire them for the objects we have already treated, so we should mark them.

The problem is that the second round of fireEvents may modify the changeset once again, so it's maybe not the best solution...

lsmith77 commented 10 years ago

I am now releasing RC2 .. if you want to get this into 1.2, please open a PR quickly as it makes it much easier to discuss things .. even if the PR isn't 100% ready to be merged. just mark it [WIP]

asiragusa commented 10 years ago

I think that this will need to wait 1.3, this week and the next one i'll be busy...

dbu commented 9 years ago

@asiragusa do you think you can work on this?