Open lunetics opened 10 years ago
@lunetics actually, when the import is in smaller chunks it actually takes longer. I originally had the import happening at every 100 entries however it took much longer to perform the overall import than the 'all at once' approach.
By using a huge block we trade a large memory footprint to gain speed in the rebuilding of db indexes and performing the imports as a single transaction.
I almost managed to get it down by almost the half using small batched insert with detached/ clear unused / old entities
geonames:load:localities --no-debug -env=prod -v AF AF (Afghanistan) data saved Imported in 59.005122 seconds.
geonames:load:localities --no-debug -env=prod -v AF AF (Afghanistan) data saved Imported in 39.573471 seconds.
Also there is not (unique) index on geonames_id column in mysql, adding that helps alot, as the import will slow down since there's an select on the id for every entry.
I just added this piece of code right before each iteration in while here:
https://github.com/Josiah/JJsGeonamesBundle/blob/master/Import/LocalityImporter.php#L619
'repository' => get_class($localityRepository),
]);
if ($lineNumber % 200 == 0) {
foreach ($managers as $manager) {
$manager->flush();
foreach ($entities as $entity) {
$manager->detach($entity);
$manager->clear($entity);
unset($entity);
}
}
unset($entities);
}
}
Interesting, I guess that I was wrong!
Can you submit a PR? I'll merge it straight away.
Still working on it, looking to improve this already great bundle a little bit more. Still you are very advanced and i still need to understand how your structuring of repositories works :)
The other way could be to load the file directly in mysql raw via INFILE and process / link the entities afterwards (load infile is awesome fast)
Also i look to load the alternate names into the database and having some unified way to interact with geonames_id's
Tried a little bit around. Shouldn't it be possible to import / save in smaller chunks, so that the entitymanager could be cleared all xy parsed entries? should speed the import up.
Any Idea?