Closed TheHett closed 2 years ago
Probably detach is not unsetting embedded documents' own identity map. Theoretically it'll be easy to fix unless we did that specifically because of spl_object_id
collisions.
@malarzm Isn't it related to QueryBuilder
caching results?
could be, I'm replying from the top of my head :)
@malarzm Isn't it related to
QueryBuilder
caching results?
QueryBuilder does not cache results, as it doesn't even have any. Results are only stored in UnitOfWork through identity map. However, detach operations cascade for embedded relationships, so that every embedded document is properly detached. I was also able to reproduce this in a test.
@TheHett are there any other relationships (maybe references) that aren't detached? The continuously increasing size of the unit of work suggests that.
QueryBuilder does not cache results, as it doesn't even have any. Results are only stored in UnitOfWork through identity map.
I can see CachingIterator
in the provided stacktrace, which has caching in it.
are there any other relationships (maybe references) that aren't detached? The continuously increasing size of the unit of work suggests that.
Yes, but i tried to remove different fields, and made sure that this is field $hashes4
.
In addition, this can be seen in the rapid increase of memory consumption (the field $hashes4
contains very large array of bin data and the script falls after ~150 iterations despite 1gb memory limit )
@TheHett try applying code changes from this PR #2116
I can see
CachingIterator
in the provided stacktrace, which has caching in it.
That is due to the Query class itself, not the builder. The builder only assembles the query array and passes it on to the query object when you run getQuery()
. Semantic difference, but important nonetheless.
Sorry :crying_cat_face:, I meant Query
class.
Anyway, I'm practically sure that the CachingIterator
is at fault here. Applying fix from the PR and setting Query
as ->rewindable(true)
should help.
Yep, CachingIterator may very well be the culprit. 😢 Detaching entities works as expected, but there's an additional reference stored in the iterator instance, so it's never garbage collected. Still, doesn't explain why the UoW size keeps growing 🤔
doesn't explain why the UoW size keeps growing
I was wondering as well. Maybe embedded documents are still kept, even if parent was detached :thinking: ? I was sure there were some tests that covered this though (don't have time to look for it atm).
I was wondering as well. Maybe embedded documents are still kept, even if parent was detached 🤔 ? I was sure there were some tests that covered this though (don't have time to look for it atm).
I explicitly tested that:
public function testDetachCascadesToEmbeddedDocuments()
{
$user = new User();
$user->addPhonenumber(new Phonenumber('123'));
$user->addPhonenumber(new Phonenumber('234'));
$this->dm->persist($user);
$this->dm->flush();
$this->dm->clear();
$user = $this->dm->find(User::class, $user->getId());
$this->assertNotNull($user);
$this->assertCount(2, $user->getPhoneNumbers());
$this->assertSame(3, $this->dm->getUnitOfWork()->size());
$this->dm->detach($user);
$this->assertSame(0, $this->dm->getUnitOfWork()->size());
}
Test passes successfully.
try applying code changes from this PR #2116
@Steveb-p ok, I can only do it tomorrow at the office
@alcaeus in my case the memory wasn't released even I called $dm->clear()
on each iteration.
$cursor = $dm->createQueryBuilder(FileKeyframesHash::class)
->setRewindable(true)
->getQuery()
->execute();
@Steveb-p It did not help, same problem.
The next configuration works fine (no leaks):
$cursor = $dm->createQueryBuilder(FileKeyframesHash::class)
->setRewindable(false)
->hydrate(false)
->getQuery()
->execute();
The next - falls (with memory limit exception):
->setRewindable(true)
->hydrate(false)
Ok. I'll see how we can fix this in 2.0 as the new API will require a new minor release. Until then we'll document the behaviour a bit better. Weak references in PHP can't come soon enough 😅
The next configuration works fine (no leaks):
$cursor = $dm->createQueryBuilder(FileKeyframesHash::class) ->setRewindable(false) ->hydrate(false) ->getQuery() ->execute();
The next - falls (with memory limit exception):
->setRewindable(true) ->hydrate(false)
Sorry, setting rewindable to false
is the setting that disables caching for cursor. Rewindable = true
is current behavior.
@Steveb-p yes, i understood that this is a typo (I looked at the code). It really solvers the problem only when hydration is disabled. Maybe there are two problems...
Maybe I didn’t put it right, but the problem persists in any rewindable position, if hydration is turned on. No problems only when rewindable and hydration both disabled.
@TheHett try also setting the read-only option (QueryBuilder::readOnly
). This should prevent objects from being put into UnitOfWork
.
EDIT: Obviously it may not solve the root of the issue (objects still remaining in UoW
after detaching/clearing, if that's the case), but it should hopefully fix the memory leak.
If it's not UoW
that consumes memory, then you should look into some other places where a reference to those objects can be kept.
@Steveb-p
Works
->setRewindable(false)
->readOnly(true)
->hydrate(false)
Falls
->setRewindable(false)
->readOnly(true)
->hydrate(true)
@TheHett you could inspect the unit of work after the first detach call to see what entities are still managed. Rewindable will create an additional reference which prevents freeing memory, but the UoW size should definitely be 0 after you call detach.
@alcaeus Now with readOnly
setting UoW is empty, but memory comsumption is growing.
The next screen done after 90 iterations:
It is really to inspect where is all memory using xdebug?
In the screenshot provided we can see that you have 7 FileKeyframesHash
objects stored in memory and 4 FilterCollection
s. Could you see if they cross-reference each other?
@Steveb-p it is not arrays, $e
- is one entity with 7 fields.
I tried to make test that shows leak
<?php
declare(strict_types=1);
namespace Doctrine\ODM\MongoDB;
use Doctrine\ODM\MongoDB\Tests\BaseTest;
use Documents\Bars\Bar;
use Documents\Bars\Location;
class MemoryLeakTest extends BaseTest
{
/**
* @throws MongoDBException
*/
public function setUp(): void
{
parent::setUp();
for ($j = 0; $j < 100; $j++) {
$bar = new Bar("Ololosh Alc");
for ($i = 0; $i < 10; $i++) {
$bar->addLocation(new Location($this->generateRandomString(10000)));
}
$this->dm->persist($bar);
}
$this->dm->flush();
}
public function tearDown(): void
{
$this->dm->clear();
parent::tearDown();
}
/**
* @throws MongoDBException
*/
public function testEmbeddedDocument(): void
{
$query = $this->dm->createQueryBuilder(Bar::class)
->setRewindable(false)
->readOnly(true)
->getQuery();
$iterator = $query->execute();
$counter = 0;
foreach ($iterator as $bar) {
/** @var $bar Bar */
$iterator = $bar->getLocations()->getIterator();
if ($counter % 10 === 0) {
gc_collect_cycles();
echo memory_get_usage(), PHP_EOL;
}
$counter++;
}
}
function generateRandomString(int $length): string {
$characters = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
$charactersLength = strlen($characters);
$randomString = '';
for ($i = 0; $i < $length; $i++) {
$randomString .= $characters[random_int(0, $charactersLength - 1)];
}
return $randomString;
}
}
I have next result:
hett@hett-pc:~/mongodb-odm$ ./vendor/bin/phpunit tests/Doctrine/ODM/MongoDB/Tests/MemoryLeakTest.php
PHPUnit 8.5.2 by Sebastian Bergmann and contributors.
R 1 / 1 (100%)17494696
18881832
20195240
21508648
22822056
24151848
25465256
26778664
28092072
29405480
Time: 7.5 seconds, Memory: 32.00 MB
It shows that memory consumption strongly grows when we accessing to iterator.
If we comment out the line $iterator = $bar->getLocations()->getIterator();
then have no problem.
hett@hett-pc:~/mongodb-odm$ ./vendor/bin/phpunit tests/Doctrine/ODM/MongoDB/Tests/MemoryLeakTest.php
PHPUnit 8.5.2 by Sebastian Bergmann and contributors.
R 1 / 1 (100%)17476368
17476496
17476624
17476752
17476880
17477008
17477136
17477264
17477392
17477520
Then it seems initializing PersistentCollection
causes memory leak?
Thanks for providing the test case. I'll take a deeper look at it later.
Maybe is problem in CachingIterator
?
I use \Doctrine\ODM\MongoDB\Aggregation\Builder
and try iterate over big portion data.
How I see CachingIterator
on every step get item from wrapped iterator and store in own variable. If you iterate over, for example 5000 objects, then they all stay in memory and you cant delete this objects until program is done.
I little change test of @TheHett and what I got
Also I change access modifier for \Doctrine\ODM\MongoDB\Iterator\CachingIterator::getIterator
from private on public for test him
```php generateRandomString(10000)); $this->dm->persist($bar); } $this->dm->flush(); } public function tearDown(): void { $this->dm->clear(); parent::tearDown(); } private function memUsage(): void { gc_collect_cycles(); echo memory_get_usage(), PHP_EOL; } /** * @throws MongoDBException */ public function testCacheIteratorWithDetach(): void { $query = $this->dm->createQueryBuilder(Bar::class) ->readOnly(true) ->getQuery(); $iterator = $query->execute(); $this->assertInstanceOf(CachingIterator::class, $iterator); $counter = 0; $this->memUsage(); foreach ($iterator as $bar) { if ($counter % 100 === 0) { $this->memUsage(); } $counter++; $this->dm->detach($bar); } $this->memUsage(); } /** * @throws MongoDBException */ public function testIteratorFromCacheIteratorWithDetach(): void { $query = $this->dm->createQueryBuilder(Bar::class) ->readOnly(true) ->getQuery(); $iterator = $query->execute(); $this->assertInstanceOf(CachingIterator::class, $iterator); $iterator = $iterator->getIterator(); $counter = 0; $this->memUsage(); foreach ($iterator as $bar) { if ($counter % 100 === 0) { $this->memUsage(); } $counter++; $this->dm->detach($bar); } $this->memUsage(); } function generateRandomString(int $length): string { $characters = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'; $charactersLength = strlen($characters); $randomString = ''; for ($i = 0; $i < $length; $i++) { $randomString .= $characters[random_int(0, $charactersLength - 1)]; } return $randomString; } } ```
Result
Memory Leak (Doctrine\ODM\MongoDB\MemoryLeak)
✔ Cache iterator with detach
20976440
20976584
22256584
23532808
24813128
26085256
27357384
28645896
29918024
31190152
32462280
33719216
✔ Iterator from cache iterator with detach
21342632
21342728
21355576
21355704
21355832
21355960
21356088
21356216
21356344
21356472
21356600
21341064
I spent a lot of time last week investigating these leaks, and it turns out this has nothing to do with CachingIterator or embedded documents. I was able to observe the leak, but did not observe any difference with respect to using a non-caching iterator, or using clear
vs. using detach
. I did however observe that we didn't properly clean up references of an object as soon as HydratingIterator
was involved. However, I wasn't able to pinpoint where the object sticks around, I only observed that the refCount
I could check with xdebug was higher than it should've been, and that removing HydratingIterator
fixes the issue.
That said, I'll have to spend some more time on this, which I currently don't have. I'm moving this to the 2.0.7 milestone and hope to be able to take a closer look once I've got more time on my hands. In the meantime, please use clear
or detach
in combination with the non-caching (non-rewindable) iterator introduced in 2.1.0. The want to release 2.1.0 is one of the reasons why I don't have time to investigate this, so I hope that this at least works around the problem for the time being, even though it's not a full solution for it. Thanks for understanding.
Looking at #2215, this may not have to do with HydratingIterator directly, but rather with what's going on behind the scenes. In the related issue, memory usage spiked as the UnitOfWork was committed to the database, which does not involve HydratingIterator at all, but does involve some of the behind-the-scenes management shenanigans that UoW needs to do. This may help in debugging this issue.
Hit the same issue just now when clearing PersistentCollection
with references / embedded documents. Despite explicit call to clear()
on DocumentManager
memory remained occupied.
Had the same issue here, we do have a 3 million documents collection and using hydration true when having big embedded documents just explodes the memory, no matter what kind of clear mechanism we try (clear, detach, unset, gb, gc_collect_cycles...)
@douglasjam have you found any workaround solutions for this issue?
@StalkAlex I think mongodb-odm team already fixed the issue, at least I do not face this issue anymore. Our practice here is to always use rewindable(false) readOnly(true) and keep clearing after batches.
Unfortunately this issue is still reproducible in my case with odm v2.2.2.
@StalkAlex is your case the same as described previously?
I also faced this issue when loading documents in chunks (50 documents per query) to edit them: the memory usage increases after each iteration.
This happens when iterating over a collection. I. e. the $items property keeps being appended. Did not find a way to avoid this behavior yet.
I guess PersistentCollection
should have some way of disabling the result caching?
I found that in later versions of doctrine/mongodb-odm there is a way to make an UnrewindableIterator
, which does not store the traversed items within it. Hence it is enough to flag the query as '$rewindable = false`. That's it.
But, this is not possible in some older projects of which dependencies do not allow to have that newer version of the mongodb-odm. Sadly... It's time to do some upgrading...
/**
* Decorate the cursor with caching, hydration, and priming behavior.
*
* Note: while this method could strictly take a MongoDB\Driver\Cursor, we
* accept Traversable for testing purposes since Cursor cannot be mocked.
* HydratingIterator, CachingIterator, and BaseIterator expect a Traversable
* so this should not have any adverse effects.
*/
private function makeIterator(Traversable $cursor): Iterator
{
if ($this->hydrate) {
$cursor = new HydratingIterator($cursor, $this->dm->getUnitOfWork(), $this->class, $this->unitOfWorkHints);
}
$cursor = $this->rewindable ? new CachingIterator($cursor) : new UnrewindableIterator($cursor);
if (! empty($this->primers)) {
$referencePrimer = new ReferencePrimer($this->dm, $this->dm->getUnitOfWork());
$cursor = new PrimingIterator($cursor, $this->class, $referencePrimer, $this->primers, $this->unitOfWorkHints);
}
return $cursor;
}
Version 2.4.2
I still facing into memory leak issue when trying to update documents in chunks. Unrewindable iterator doesn't work, it decrements refCount by one but selected documents still have a big refCount value (in my case it equals 4).
Example of my code (one chunk):
// Fetch documents
$products = $this->manager->createQueryBuilder(Product::class)
->field('productId')->in($ids)
->setRewindable(false)
->getQuery();
// Update documents
foreach ($products as $product) {
// refCount is 7 here
$product->update($data);
}
$this->dm->flush();
$this->dm->clear();
// Every product document have refCount equals to 4
Memory usage:
20667472
22450752
24164304
25579232
27516832
29460384
30871360
32300208
33712384
35121872
37588600
40055856
41467600
42859520
44279744
@alexxxxkkk have you tried running gc_collect_cycles()
after the clear()
? If that's not cycles you may employ https://github.com/BitOne/php-meminfo to see what's being referenced and by what. I can't really tell what's going on by looking at your example
@malarzm No, I haven't. Thanks, will do this and post a result.
@malarzm Thanks, all work as expected. This was due to an enabled debugger in the dev environment, as CommandLogger stores events in local property.
Great! I'll allow myself to close the issue finally then :tada:
Bug Report
| Version | 2.0.3
Summary
I ran into a memory leak and it occurs when using
EmbedMany
annotation.The problem occurs event when used
$dm->clear()
instead of$dm->detach($e)
UPD: HERE I PROVIDED TEST https://github.com/doctrine/mongodb-odm/issues/2162#issuecomment-584599533
example of my script output: