Closed armetiz closed 10 years ago
Experiencing this issue also --- any solutions?
Is this with Propel or Doctrine? If the latter, which library and version?
Most of the memory issues reported have nothing to do with Elastica, but are related to processing a large number of Doctrine entities.
I don't remember which version, but I've tested with Doctrine.
Hmmm, that's very possible. Here's our libs:
doctrine/annotations v1.1.1 Docblock Annotations Parser doctrine/cache v1.0 Caching library offering an object-oriented API for many cache backends doctrine/collections v1.1 Collections Abstraction library doctrine/common 2.4.0-RC3 Common Library for Doctrine projects doctrine/dbal 2.3.4 Database Abstraction Layer doctrine/doctrine-bundle v1.2.0 Symfony DoctrineBundle doctrine/doctrine-migrations-bundle dev-master 6891b85 Symfony DoctrineMigrationsBundle doctrine/inflector v1.0 Common String Manipulations with regard to casing and singular/plural rules. doctrine/lexer v1.0 Base library for a lexer that can be used in Top-Down, Recursive Descent Parsers. doctrine/migrations v1.0-ALPHA1 Database Schema migrations using Doctrine DBAL doctrine/orm 2.3.4 Object-Relational-Mapper for PHP
/me sigh of relief that it's not ODM this time
I think this is due to memory leaks in the UnitOfWork (perhaps due to circular object references). If you remove Elastica from the equation by tweaking the bundle code, can you reproduce the memory leak simply by iterating through 100k+ entities?
If it's Doctrine2 then it's most definitely the circular reference problem. And I don't think there is a solution for that. The recommendations on doctrine website on how to process large datasets DO NOT work in case where your objects have a circular references.
Same issue here. Doctrine. No circular references though.
I rewrote most of my import functions to use spork. The imports do get significantly slower but they complete without a hiccups.
@evolchek: The circular reference issue is not necessarily caused by entity/document classes referring to each other, but rather references to UnitOfWork and other internal classes.
I've been having the same problem whilst trying to populate an index with ~4m documents coming out of MongoDB. I've been using the following command: php app/console fos:elastica:populate --no-debug --env=prod -q
It seems that logging is causing my problems. If I comment out $this->_logger = $logger;
in Elastica\Client::setLogger, the issue no longer occurs. If anyone else has the same problem, you can easily fix it with the following code:
#app/config/config.yml
parameters:
fos_elastica.client.class: Namespace\Elastica\Client
#src/Namespace/Elastica/Client.php
<?php
namespace Namespace\Elastica;
use Elastica\Client as BaseClient;
use Psr\Log\LoggerInterface;
class Client extends BaseClient
{
public function setLogger(LoggerInterface $logger)
{
return $this;
}
}
I haven't had time to look into why setting the logger causes this issue though so there could be a better/simpler solution.
Using Doctrine here and I get problems loading really "huge" fixtures. E.g. 10 records. (Not 10 million... just 10).
PHP doesn't run out of memory, but Java does. Usual memory usage starts at about 150Mb, then shoots up to over 1Gb and never comes down again. Fixture loading stalls then eventually (sometimes) explodes complaining about an Elastica timeout:
[Elastica\Exception\Connection\HttpException]
Operation timed out
The Elastica log shows lots of lines like this:
[2013-12-16 16:27:16,804][WARN ][monitor.jvm] [Sasquatch] [gc][ParNew][1048][169] duration [1.1s],
collections [1]/[4.9s], total [1.1s]/[7.8s], memory [671.7mb]->[764mb]/[990.7mb],
all_pools {[Code Cache] [3.7mb]->[3.7mb]/[48mb]}
{[Par Eden Space] [91.7mb]->[72.7mb]/[266.2mb]}{[Par Survivor Space] [33.2mb]->[0b]/[33.2mb]}
{[CMS Old Gen] [546.7mb]->[691.2mb]/[691.2mb]}{[CMS Perm Gen] [30.3mb]->[30.3mb]/[82mb]}
[2013-12-16 16:28:39,494][INFO ][monitor.jvm] [Sasquatch] [gc][ConcurrentMarkSweep][1068][30] duration [5s],
collections [1]/[5.2s], total [5s]/[1.4m], memory [923.6mb]->[925.2mb]/[990.7mb],
all_pools {[Code Cache] [3.7mb]->[3.7mb]/[48mb]}
{[Par Eden Space] [232.4mb]->[233.9mb]/[266.2mb]}{[Par Survivor Space] [0b]->[0b]/[33.2mb]}
{[CMS Old Gen] [691.2mb]->[691.2mb]/[691.2mb]}{[CMS Perm Gen] [30.3mb]->[30.3mb]/[82mb]}
Not very helpful! I'm hoping I've got something set up wrong, but not found anything wrong with the config so far...
You are send huge documents? You should try reducing the bulk size then, with the batch-size
option, and add some sleep
between each batch.
Also, you should try to index those document outside the bundle, via cURL.
batch_size is set to 1 (for now).
The documents are not huge, here's an example (the rest are similar):
$pageRealArticle2 = new Page();
$pageRealArticle2->setPageTitle('Article page #2');
$pageRealArticle2->setPageType(Page::TYPE_ARTICLE);
$pageRealArticle2->setSiteFamilyId(1);
$pageRealArticle2->addSite($this->getReference('site-real'));
$pageRealArticle2->setShortcode(self::OLD_SHORTCODE_ART_2);
$pageRealArticle2->setUrl(self::OLD_URL_ART_2);
$pageRealArticle2->setPublicationDate(new \DateTime('2013-01-02'));
$pageRealArticle2->setActiveLevelBySiteOwner(Page::ACTIVE_LEVEL_SHOW_PUBLIC);
$pageRealArticle2->setSummaryText('Summary text for article page 2');
$manager->persist($pageRealArticle2);
All the class constants are simple strings or ints.
The fos_elastica config is:
fos_elastica:
clients:
default: { host: localhost, port: 9200 }
serializer: ~ # leaving it blank like this enables the use of KnpPaginator
indexes:
website:
client: default
index_name: %elastica_index_name%
types:
page:
mappings:
pageTitle: { boost: 5 }
summaryText: { boost: 3 }
introText: { boost: 3 }
text1: { boost: 2 }
text2: { boost: 2 }
metaDescription: { boost: 1 }
metaKeywords: { boost: 1 }
persistence:
driver: orm # orm, mongodb, propel are available
model: Acme\MyBundle\Entity\Page
provider:
query_builder_method: createElasticSearchQueryBuilder
batch_size: 1
listener:
is_indexable_callback: 'isElasticSearchIndexable'
finder: ~ # enables retrieval of Doctrine entities via fos_elastica.finder.[index].[type] service
(Edit: I'm guessing the serializer is where things are falling down... doing some reading around that atm!)
OK, the problem I mentioned above is a new one (didn't used to happen). I've tracked it down as far as this:
Before this version of this file everything works quickly with no problems. However, the changes in this commit (in the DependencyInjection/FOSElasticaExtension.php file) seem to cause the slow-down and fixture loading grinds to a halt, even with my modest test data.
Can somebody wiser than I am look into this?
Try disabling the serializer
Thanks @merk - removing the serializer sorts out the fixture loading.
What is the serializer entry for in the config.yml and when should it (not) be used?
The serializer allows the bundle to automatically convert objects to json and send it directly to Elasticsearch, meaning you dont need to define mappings for types.
You do however need to define JMS Serializer metadata to each entity you're indexing otherwise the bundle will try to serialize the entire object graph which is not what you want.
Hi there, I'm testing FOSElasticaBundle & command populate
I get some out-of-memory problem with "huge" set of row, on my computer problems comes with around 100k rows on the RDBMS.
ElasticSearch is design to be used with more than 100k document.
Regards,