doctrine / mongodb-odm

The Official PHP MongoDB ORM/ODM
https://www.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/
MIT License
1.09k stars 504 forks source link

Memory leak when useing embedded documents #2162

Closed TheHett closed 2 years ago

TheHett commented 4 years ago

Bug Report

| Version | 2.0.3

Summary

I ran into a memory leak and it occurs when using EmbedMany annotation.

/**
 * @ODM\Document(collection="file_keyfames_hash")
 */
class FileKeyframesHash
{
    /**
     * @var Collection<FileKeyframesHash>
     * @ODM\EmbedMany(name="hashes_4", targetDocument=KeyframesHash::class)
     */
    private $hashes4;
}
/**
 * @ODM\EmbeddedDocument()
 */
class KeyframesHash
{
    /**
     * @var string
     * @ODM\Field(name="hash", type="bin")
     */
    private $hash;

    // ... getters/setters
}
        $cursor = $dm->createQueryBuilder(FileKeyframesHash::class)
            ->getQuery()
            ->execute();

        $count = 0;
        foreach ($cursor as $e) {
            /** @var FileKeyframesHash $e */
            $count++;
            if ($count % 10 === 0) {
                echo "Processed: {$count}", PHP_EOL;
                echo "Managed: {$dm->getUnitOfWork()->size()}", PHP_EOL;
                echo Helper::bytesHumanize(memory_get_usage()), PHP_EOL;
                echo PHP_EOL;
            }
            $dm->detach($e);
            unset($e);
        }

The problem occurs event when used $dm->clear() instead of $dm->detach($e)

UPD: HERE I PROVIDED TEST https://github.com/doctrine/mongodb-odm/issues/2162#issuecomment-584599533

example of my script output:


Processed: 100
Managed: 101
472.7 MB

Processed: 110
Managed: 111
538.9 MB

Processed: 120
Managed: 121
592.3 MB

Processed: 130
Managed: 131
684.9 MB

Processed: 140
Managed: 141
766.0 MB

Processed: 150
Managed: 151
851.1 MB

Processed: 160
Managed: 161
903.2 MB

Processed: 170
Managed: 171
967.1 MB

PHP Fatal error:  Allowed memory size of 1073741824 bytes exhausted (tried to allocate 20480 bytes) in /usr/local/app/vendor/doctrine/mongodb-odm/lib/Doctrine/ODM/MongoDB/Iterator/HydratingIterator.php on line 106
PHP Stack trace:
PHP   1. {main}() /usr/local/app/yii:0
PHP   2. yii\console\Application->run() /usr/local/app/yii:32
PHP   3. yii\console\Application->handleRequest() /usr/local/app/vendor/yiisoft/yii2/base/Application.php:386
PHP   4. yii\console\Application->runAction() /usr/local/app/vendor/yiisoft/yii2/console/Application.php:147
PHP   5. yii\console\Application->runAction() /usr/local/app/vendor/yiisoft/yii2/console/Application.php:180
PHP   6. app\commands\TestController->runAction() /usr/local/app/vendor/yiisoft/yii2/base/Module.php:528
PHP   7. app\commands\TestController->runAction() /usr/local/app/vendor/yiisoft/yii2/console/Controller.php:164
PHP   8. yii\base\InlineAction->runWithParams() /usr/local/app/vendor/yiisoft/yii2/base/Controller.php:157
PHP   9. call_user_func_array:{/usr/local/app/vendor/yiisoft/yii2/base/InlineAction.php:57}() /usr/local/app/vendor/yiisoft/yii2/base/InlineAction.php:57
PHP  10. app\commands\TestController->actionExportHashes() /usr/local/app/vendor/yiisoft/yii2/base/InlineAction.php:57
PHP  11. Doctrine\ODM\MongoDB\Iterator\CachingIterator->next() /usr/local/app/commands/TestController.php:121
PHP  12. Generator->next() /usr/local/app/vendor/doctrine/mongodb-odm/lib/Doctrine/ODM/MongoDB/Iterator/CachingIterator.php:89
PHP  13. Doctrine\ODM\MongoDB\Iterator\CachingIterator->wrapTraversable() /usr/local/app/vendor/doctrine/mongodb-odm/lib/Doctrine/ODM/MongoDB/Iterator/CachingIterator.php:89
PHP  14. Doctrine\ODM\MongoDB\Iterator\HydratingIterator->next() /usr/local/app/vendor/doctrine/mongodb-odm/lib/Doctrine/ODM/MongoDB/Iterator/CachingIterator.php:156
PHP  15. Generator->next() /usr/local/app/vendor/doctrine/mongodb-odm/lib/Doctrine/ODM/MongoDB/Iterator/HydratingIterator.php:71
PHP  16. Doctrine\ODM\MongoDB\Iterator\HydratingIterator->wrapTraversable() /usr/local/app/vendor/doctrine/mongodb-odm/lib/Doctrine/ODM/MongoDB/Iterator/HydratingIterator.php:71
malarzm commented 4 years ago

Probably detach is not unsetting embedded documents' own identity map. Theoretically it'll be easy to fix unless we did that specifically because of spl_object_id collisions.

Steveb-p commented 4 years ago

@malarzm Isn't it related to QueryBuilder caching results?

malarzm commented 4 years ago

could be, I'm replying from the top of my head :)

alcaeus commented 4 years ago

@malarzm Isn't it related to QueryBuilder caching results?

QueryBuilder does not cache results, as it doesn't even have any. Results are only stored in UnitOfWork through identity map. However, detach operations cascade for embedded relationships, so that every embedded document is properly detached. I was also able to reproduce this in a test.

@TheHett are there any other relationships (maybe references) that aren't detached? The continuously increasing size of the unit of work suggests that.

Steveb-p commented 4 years ago

QueryBuilder does not cache results, as it doesn't even have any. Results are only stored in UnitOfWork through identity map.

I can see CachingIterator in the provided stacktrace, which has caching in it.

TheHett commented 4 years ago

are there any other relationships (maybe references) that aren't detached? The continuously increasing size of the unit of work suggests that.

Yes, but i tried to remove different fields, and made sure that this is field $hashes4. In addition, this can be seen in the rapid increase of memory consumption (the field $hashes4 contains very large array of bin data and the script falls after ~150 iterations despite 1gb memory limit )

Steveb-p commented 4 years ago

@TheHett try applying code changes from this PR #2116

alcaeus commented 4 years ago

I can see CachingIterator in the provided stacktrace, which has caching in it.

That is due to the Query class itself, not the builder. The builder only assembles the query array and passes it on to the query object when you run getQuery(). Semantic difference, but important nonetheless.

Steveb-p commented 4 years ago

Sorry :crying_cat_face:, I meant Query class.

Anyway, I'm practically sure that the CachingIterator is at fault here. Applying fix from the PR and setting Query as ->rewindable(true) should help.

alcaeus commented 4 years ago

Yep, CachingIterator may very well be the culprit. 😢 Detaching entities works as expected, but there's an additional reference stored in the iterator instance, so it's never garbage collected. Still, doesn't explain why the UoW size keeps growing 🤔

Steveb-p commented 4 years ago

doesn't explain why the UoW size keeps growing

I was wondering as well. Maybe embedded documents are still kept, even if parent was detached :thinking: ? I was sure there were some tests that covered this though (don't have time to look for it atm).

alcaeus commented 4 years ago

I was wondering as well. Maybe embedded documents are still kept, even if parent was detached 🤔 ? I was sure there were some tests that covered this though (don't have time to look for it atm).

I explicitly tested that:

public function testDetachCascadesToEmbeddedDocuments()
{
    $user = new User();
    $user->addPhonenumber(new Phonenumber('123'));
    $user->addPhonenumber(new Phonenumber('234'));

    $this->dm->persist($user);
    $this->dm->flush();

    $this->dm->clear();

    $user = $this->dm->find(User::class, $user->getId());
    $this->assertNotNull($user);
    $this->assertCount(2, $user->getPhoneNumbers());

    $this->assertSame(3, $this->dm->getUnitOfWork()->size());

    $this->dm->detach($user);

    $this->assertSame(0, $this->dm->getUnitOfWork()->size());
}

Test passes successfully.

TheHett commented 4 years ago

try applying code changes from this PR #2116

@Steveb-p ok, I can only do it tomorrow at the office

TheHett commented 4 years ago

@alcaeus in my case the memory wasn't released even I called $dm->clear() on each iteration.

TheHett commented 4 years ago
        $cursor = $dm->createQueryBuilder(FileKeyframesHash::class)
            ->setRewindable(true)
            ->getQuery()
            ->execute();

@Steveb-p It did not help, same problem.

TheHett commented 4 years ago

The next configuration works fine (no leaks):

$cursor = $dm->createQueryBuilder(FileKeyframesHash::class)
            ->setRewindable(false)
            ->hydrate(false)
            ->getQuery()
            ->execute();

The next - falls (with memory limit exception):

            ->setRewindable(true)
            ->hydrate(false)
alcaeus commented 4 years ago

Ok. I'll see how we can fix this in 2.0 as the new API will require a new minor release. Until then we'll document the behaviour a bit better. Weak references in PHP can't come soon enough 😅

Steveb-p commented 4 years ago

The next configuration works fine (no leaks):

$cursor = $dm->createQueryBuilder(FileKeyframesHash::class)
            ->setRewindable(false)
            ->hydrate(false)
            ->getQuery()
            ->execute();

The next - falls (with memory limit exception):

            ->setRewindable(true)
            ->hydrate(false)

Sorry, setting rewindable to false is the setting that disables caching for cursor. Rewindable = true is current behavior.

TheHett commented 4 years ago

@Steveb-p yes, i understood that this is a typo (I looked at the code). It really solvers the problem only when hydration is disabled. Maybe there are two problems...

TheHett commented 4 years ago

Maybe I didn’t put it right, but the problem persists in any rewindable position, if hydration is turned on. No problems only when rewindable and hydration both disabled.

Steveb-p commented 4 years ago

@TheHett try also setting the read-only option (QueryBuilder::readOnly). This should prevent objects from being put into UnitOfWork.

EDIT: Obviously it may not solve the root of the issue (objects still remaining in UoW after detaching/clearing, if that's the case), but it should hopefully fix the memory leak.

If it's not UoW that consumes memory, then you should look into some other places where a reference to those objects can be kept.

TheHett commented 4 years ago

@Steveb-p

Works

            ->setRewindable(false)
            ->readOnly(true)
            ->hydrate(false)

Falls

            ->setRewindable(false)
            ->readOnly(true)
            ->hydrate(true)
alcaeus commented 4 years ago

@TheHett you could inspect the unit of work after the first detach call to see what entities are still managed. Rewindable will create an additional reference which prevents freeing memory, but the UoW size should definitely be 0 after you call detach.

TheHett commented 4 years ago

@alcaeus Now with readOnly setting UoW is empty, but memory comsumption is growing. The next screen done after 90 iterations: image

It is really to inspect where is all memory using xdebug?

Steveb-p commented 4 years ago

In the screenshot provided we can see that you have 7 FileKeyframesHash objects stored in memory and 4 FilterCollections. Could you see if they cross-reference each other?

TheHett commented 4 years ago

@Steveb-p it is not arrays, $e - is one entity with 7 fields. image

TheHett commented 4 years ago

I tried to make test that shows leak

<?php

declare(strict_types=1);

namespace Doctrine\ODM\MongoDB;

use Doctrine\ODM\MongoDB\Tests\BaseTest;
use Documents\Bars\Bar;
use Documents\Bars\Location;

class MemoryLeakTest extends BaseTest
{
    /**
     * @throws MongoDBException
     */
    public function setUp(): void
    {
        parent::setUp();
        for ($j = 0; $j < 100; $j++) {
            $bar = new Bar("Ololosh Alc");
            for ($i = 0; $i < 10; $i++) {
                $bar->addLocation(new Location($this->generateRandomString(10000)));
            }
            $this->dm->persist($bar);
        }
        $this->dm->flush();
    }

    public function tearDown(): void
    {
        $this->dm->clear();
        parent::tearDown();
    }

    /**
     * @throws MongoDBException
     */
    public function testEmbeddedDocument(): void
    {
        $query = $this->dm->createQueryBuilder(Bar::class)
            ->setRewindable(false)
            ->readOnly(true)
            ->getQuery();

        $iterator = $query->execute();
        $counter = 0;
        foreach ($iterator as $bar) {
            /** @var $bar Bar */
            $iterator = $bar->getLocations()->getIterator();
            if ($counter % 10 === 0) {
                gc_collect_cycles();
                echo memory_get_usage(), PHP_EOL;
            }
            $counter++;
        }
    }

    function generateRandomString(int $length): string {
        $characters = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
        $charactersLength = strlen($characters);
        $randomString = '';
        for ($i = 0; $i < $length; $i++) {
            $randomString .= $characters[random_int(0, $charactersLength - 1)];
        }
        return $randomString;
    }
}

I have next result:

hett@hett-pc:~/mongodb-odm$ ./vendor/bin/phpunit tests/Doctrine/ODM/MongoDB/Tests/MemoryLeakTest.php 
PHPUnit 8.5.2 by Sebastian Bergmann and contributors.

R                                                                   1 / 1 (100%)17494696
18881832
20195240
21508648
22822056
24151848
25465256
26778664
28092072
29405480

Time: 7.5 seconds, Memory: 32.00 MB

It shows that memory consumption strongly grows when we accessing to iterator.

TheHett commented 4 years ago

If we comment out the line $iterator = $bar->getLocations()->getIterator(); then have no problem.

hett@hett-pc:~/mongodb-odm$ ./vendor/bin/phpunit tests/Doctrine/ODM/MongoDB/Tests/MemoryLeakTest.php 
PHPUnit 8.5.2 by Sebastian Bergmann and contributors.

R                                                                   1 / 1 (100%)17476368
17476496
17476624
17476752
17476880
17477008
17477136
17477264
17477392
17477520
Steveb-p commented 4 years ago

Then it seems initializing PersistentCollection causes memory leak?

alcaeus commented 4 years ago

Thanks for providing the test case. I'll take a deeper look at it later.

serieznyi commented 4 years ago

Maybe is problem in CachingIterator? I use \Doctrine\ODM\MongoDB\Aggregation\Builder and try iterate over big portion data. How I see CachingIterator on every step get item from wrapped iterator and store in own variable. If you iterate over, for example 5000 objects, then they all stay in memory and you cant delete this objects until program is done.

serieznyi commented 4 years ago

I little change test of @TheHett and what I got

Also I change access modifier for \Doctrine\ODM\MongoDB\Iterator\CachingIterator::getIterator from private on public for test him

Test

```php generateRandomString(10000)); $this->dm->persist($bar); } $this->dm->flush(); } public function tearDown(): void { $this->dm->clear(); parent::tearDown(); } private function memUsage(): void { gc_collect_cycles(); echo memory_get_usage(), PHP_EOL; } /** * @throws MongoDBException */ public function testCacheIteratorWithDetach(): void { $query = $this->dm->createQueryBuilder(Bar::class) ->readOnly(true) ->getQuery(); $iterator = $query->execute(); $this->assertInstanceOf(CachingIterator::class, $iterator); $counter = 0; $this->memUsage(); foreach ($iterator as $bar) { if ($counter % 100 === 0) { $this->memUsage(); } $counter++; $this->dm->detach($bar); } $this->memUsage(); } /** * @throws MongoDBException */ public function testIteratorFromCacheIteratorWithDetach(): void { $query = $this->dm->createQueryBuilder(Bar::class) ->readOnly(true) ->getQuery(); $iterator = $query->execute(); $this->assertInstanceOf(CachingIterator::class, $iterator); $iterator = $iterator->getIterator(); $counter = 0; $this->memUsage(); foreach ($iterator as $bar) { if ($counter % 100 === 0) { $this->memUsage(); } $counter++; $this->dm->detach($bar); } $this->memUsage(); } function generateRandomString(int $length): string { $characters = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'; $charactersLength = strlen($characters); $randomString = ''; for ($i = 0; $i < $length; $i++) { $randomString .= $characters[random_int(0, $charactersLength - 1)]; } return $randomString; } } ```

Result

Memory Leak (Doctrine\ODM\MongoDB\MemoryLeak)
 ✔ Cache iterator with detach
20976440
20976584
22256584
23532808
24813128
26085256
27357384
28645896
29918024
31190152
32462280
33719216
 ✔ Iterator from cache iterator with detach
21342632
21342728
21355576
21355704
21355832
21355960
21356088
21356216
21356344
21356472
21356600
21341064
alcaeus commented 4 years ago

I spent a lot of time last week investigating these leaks, and it turns out this has nothing to do with CachingIterator or embedded documents. I was able to observe the leak, but did not observe any difference with respect to using a non-caching iterator, or using clear vs. using detach. I did however observe that we didn't properly clean up references of an object as soon as HydratingIterator was involved. However, I wasn't able to pinpoint where the object sticks around, I only observed that the refCount I could check with xdebug was higher than it should've been, and that removing HydratingIterator fixes the issue.

That said, I'll have to spend some more time on this, which I currently don't have. I'm moving this to the 2.0.7 milestone and hope to be able to take a closer look once I've got more time on my hands. In the meantime, please use clear or detach in combination with the non-caching (non-rewindable) iterator introduced in 2.1.0. The want to release 2.1.0 is one of the reasons why I don't have time to investigate this, so I hope that this at least works around the problem for the time being, even though it's not a full solution for it. Thanks for understanding.

alcaeus commented 4 years ago

Looking at #2215, this may not have to do with HydratingIterator directly, but rather with what's going on behind the scenes. In the related issue, memory usage spiked as the UnitOfWork was committed to the database, which does not involve HydratingIterator at all, but does involve some of the behind-the-scenes management shenanigans that UoW needs to do. This may help in debugging this issue.

Steveb-p commented 4 years ago

Hit the same issue just now when clearing PersistentCollection with references / embedded documents. Despite explicit call to clear() on DocumentManager memory remained occupied.

douglasjam commented 4 years ago

Had the same issue here, we do have a 3 million documents collection and using hydration true when having big embedded documents just explodes the memory, no matter what kind of clear mechanism we try (clear, detach, unset, gb, gc_collect_cycles...)

StalkAlex commented 2 years ago

@douglasjam have you found any workaround solutions for this issue?

douglasjam commented 2 years ago

@StalkAlex I think mongodb-odm team already fixed the issue, at least I do not face this issue anymore. Our practice here is to always use rewindable(false) readOnly(true) and keep clearing after batches.

StalkAlex commented 2 years ago

Unfortunately this issue is still reproducible in my case with odm v2.2.2.

malarzm commented 2 years ago

@StalkAlex is your case the same as described previously?

DavideTriso commented 2 years ago

I also faced this issue when loading documents in chunks (50 documents per query) to edit them: the memory usage increases after each iteration.

sakalys commented 2 years ago

image This happens when iterating over a collection. I. e. the $items property keeps being appended. Did not find a way to avoid this behavior yet.

Steveb-p commented 2 years ago

I guess PersistentCollection should have some way of disabling the result caching?

sakalys commented 2 years ago

I found that in later versions of doctrine/mongodb-odm there is a way to make an UnrewindableIterator, which does not store the traversed items within it. Hence it is enough to flag the query as '$rewindable = false`. That's it.

But, this is not possible in some older projects of which dependencies do not allow to have that newer version of the mongodb-odm. Sadly... It's time to do some upgrading...

    /**
     * Decorate the cursor with caching, hydration, and priming behavior.
     *
     * Note: while this method could strictly take a MongoDB\Driver\Cursor, we
     * accept Traversable for testing purposes since Cursor cannot be mocked.
     * HydratingIterator, CachingIterator, and BaseIterator expect a Traversable
     * so this should not have any adverse effects.
     */
    private function makeIterator(Traversable $cursor): Iterator
    {
        if ($this->hydrate) {
            $cursor = new HydratingIterator($cursor, $this->dm->getUnitOfWork(), $this->class, $this->unitOfWorkHints);
        }

        $cursor = $this->rewindable ? new CachingIterator($cursor) : new UnrewindableIterator($cursor);

        if (! empty($this->primers)) {
            $referencePrimer = new ReferencePrimer($this->dm, $this->dm->getUnitOfWork());
            $cursor          = new PrimingIterator($cursor, $this->class, $referencePrimer, $this->primers, $this->unitOfWorkHints);
        }

        return $cursor;
    }
lexalium commented 2 years ago

Version 2.4.2

I still facing into memory leak issue when trying to update documents in chunks. Unrewindable iterator doesn't work, it decrements refCount by one but selected documents still have a big refCount value (in my case it equals 4).

Example of my code (one chunk):

// Fetch documents
$products = $this->manager->createQueryBuilder(Product::class)
            ->field('productId')->in($ids)
            ->setRewindable(false)
            ->getQuery();

// Update documents
foreach ($products as $product) {
    // refCount is 7 here
    $product->update($data);
}

$this->dm->flush();
$this->dm->clear();

// Every product document have refCount equals to 4

Memory usage:

20667472
22450752
24164304
25579232
27516832
29460384
30871360
32300208
33712384
35121872
37588600
40055856
41467600
42859520
44279744
malarzm commented 2 years ago

@alexxxxkkk have you tried running gc_collect_cycles() after the clear()? If that's not cycles you may employ https://github.com/BitOne/php-meminfo to see what's being referenced and by what. I can't really tell what's going on by looking at your example

lexalium commented 2 years ago

@malarzm No, I haven't. Thanks, will do this and post a result.

lexalium commented 2 years ago

@malarzm Thanks, all work as expected. This was due to an enabled debugger in the dev environment, as CommandLogger stores events in local property.

malarzm commented 2 years ago

Great! I'll allow myself to close the issue finally then :tada: