doctrine / orm

Doctrine Object Relational Mapper (ORM)
https://www.doctrine-project.org/projects/orm.html
MIT License
9.93k stars 2.51k forks source link

Long running process leaks memory #8891

Open flaushi opened 3 years ago

flaushi commented 3 years ago

Bug Report

Q A
BC Break ?
Version current

Summary

I think there is a memory leak in long running processes.

Current behavior

My memory consumption grows all the time although I keep no reference to to visited nodes and clear the em regularly. I am traversing an object graph using iteration (not recursion). My stack only has the identifiers, not the entities.

How to reproduce

please see my example https://stackoverflow.com/questions/68686479/leaking-memory-while-traversing-an-object-graph/68686896#68686896 here

Expected behavior

I'd expect to get along with no more than a few megabyte memory consumption all the time.

beberlei commented 3 years ago

You can use the mwmory profiler to find where this memory is https://github.com/arnaud-lb/php-memory-profiler

flaushi commented 3 years ago

Wow, I didn't know about this tool, great! However, this is the situation: image

the query being executed is this:

return $this->_em->createQuery(
    'SELECT s from App\Entity\DataCategory s 
      WHERE s.deletedAt IS NULL 
        AND MY_JSON_CONTAINS(s.tags, :tags) = true
   ORDER BY s.name'
    )
    ->setParameter('tags', json_encode($tags) )
    ->getResult();

this should just query the entities and add them to the UnitOfWork, which I clear every regularly. How is it possible that memory is leaked then?

Edit: This is confusing. My code actually fetches many more entities, but like

$inputItem->dc = $this->em->find(DataCategory::class, $inputItem->dc); // not reported or visible in memprof

if ($inputItem->dc instanceof TagDataCategory)
    $children = $this->em->getRepository(DataCategory::class)
        ->getCategoriesWithTags($inputItem->dc->selectedTags); // <--- these are reported by memprof
else
    $children = $inputItem->dc->getChildren(); // these are direct ManyToOne associations

Am I guessing correctly that memprof only reports allocations that have not been freed, so that the DQL query is the one which leaks??

Thank you so much for your help!

greg0ire commented 3 years ago

From the description in the README (emphasis mine):

The extension tracks the allocation and release of memory blocks to report the amount of memory leaked by every function, method, or file in a program.

  • Reports non-freed memory at arbitrary points in the program
flaushi commented 3 years ago

So, I am speechless. This means then that the Repository method leaks???

I thought when I load an entity through the entitiy manager it is inserted in the UnitofWork which is cleared properly by $em->clear().

I changed the repository method to first load only the ids of suitable entities an then find them

class DataCategoryRepository extends EntityRepository
{
    public function getCategoriesWithTags(array $tags, $prefetchMode = false) : array
    {
            return array_map(
                fn ($id) => $this->_em->find(DataCategory::class, $id),
                $this->getCategoryIdsWithTags($tags));

        //$where = 'SELECT s from App\Entity\DataCategory s WHERE s.deletedAt IS NULL AND MY_JSON_CONTAINS(s.tags, :tags) = true ORDER BY s.name';
        //return $this->_em->createQuery($where)
         //   ->setParameter('tags', json_encode($tags) )
         //   ->getResult();
    }

    public function getCategoryIdsWithTags(array $tags) : array
    {
        $where = 'SELECT s.id from App\Entity\DataCategory s WHERE s.deletedAt IS NULL AND MY_JSON_CONTAINS(s.tags, :tags) = true ORDER BY s.name';
        return array_column(
            $this->_em->createQuery($where)
                ->setParameter('tags', json_encode($tags) )
                ->getScalarResult(),
            'id');
    }

again here a new screenshot image

so this looks as if the repository method has a leak? Where?

Or could the rest of my code be leaking?

for the sake of completeness:

class JsonContainsCustomDQLFunction extends FunctionNode
{
    /** @var Node */
    private $second;
    /** @var Node */
    private $first;

    public function getSql(SqlWalker $sqlWalker)
    {
        $first = $this->first->dispatch($sqlWalker);
        $second = $this->second->dispatch($sqlWalker);

        if ($sqlWalker->getConnection()->getDatabasePlatform() instanceof PostgreSqlPlatform) {
            return "$first @> $second";

        } else if ($sqlWalker->getConnection()->getDatabasePlatform() instanceof MySqlPlatform) {
            return "JSON_CONTAINS($first, $second)";
        } else
            throw new QueryException('Platform for JSON_CONTAINS not supported.');
    }

    public function parse(Parser $parser)
    {
        $parser->match(Lexer::T_IDENTIFIER);
        $parser->match(Lexer::T_OPEN_PARENTHESIS);
        $this->first = $parser->StringPrimary();
        $parser->match(Lexer::T_COMMA);
        $this->second = $parser->StringPrimary();
        $parser->match(Lexer::T_CLOSE_PARENTHESIS);
    }
}
greg0ire commented 3 years ago

When you call the repository, since clear isn't called inside it, more memory is used than before, presumably because of the entity map. Although this fits the definition of a leak, it is intended, but memprof doesn't know about this.

Maybe you could try using https://github.com/BitOne/php-meminfo instead?

I think that instead of showing you what method "leaked" memory, it will show you what objects are taking up so much memory. There is even a guide on hunting down memory leaks:

https://github.com/BitOne/php-meminfo/blob/master/doc/hunting_down_memory_leaks.md

Hope this helps, I haven't had to do this myself before.

flaushi commented 3 years ago

Thanks for this direction I will follow it tmorrow.

Anyway the fact that I am calling $em->find(DataCategory::class, $id) over and over without seeing it in the memprof, but my DQL query with getResult() being shown makes me wonder.

To conclude this support case:

A) you are not aware of any memleak in queries and getResult's, right?

B) And it should be possible to "travel" the association graph of entities over millions of jumps without leaking memory, too? (of course with intermediate $em->clear()'s)

C) Both, $em->find(fqcn, $id) and $em->createQuery()->getResult() are supposed to return entities that are stored automatically in the entityMap before being returned to me? Is there an option to get hydrated but unmanaged entities from the entity manager? (I guess no)

nuryagdym commented 2 years ago

I guess it is the same problem describe here: https://stackoverflow.com/questions/26616861/memory-leak-when-executing-doctrine-query-in-loop I am running $em->clear() periodically, and still have the memory leak issue.

so in symfony config/package/doctrine.yaml I have this option:

doctrine:
    dbal:
        default_connection: main
        connections:
            main:
                logging: false

with logging: false doctrine does not log queries into the log file but I guess doctrine is keeping logs somewhere in memory that is why I am having memory leak issue.

So the solution is either