algolia / search-bundle

Seamless integration of Algolia Search into your Symfony project.
MIT License
191 stars 71 forks source link

Some entities imported via `search:import` are not indexed (missing records) #372

Open quentint opened 1 year ago

quentint commented 1 year ago

Description

When importing entities with search:import, the logs display correct index counts, but when browsing the index, some are missing.

Here is the command output:

> bin/console search:import
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 500 / 500 App\Entity\MediaTranslation entities into quentin_media index
Indexed 160 / 160 App\Entity\MediaTranslation entities into quentin_media index
Done!

I'd then expect my index to contain 14 * 500 + 160 = 7160 items, but only 5216 exist:

image

But clearing the index and importing again yields another record count (+/-5%).

Here's my configuration:

algolia_search:
    prefix: '%algolia_search_prefix%'
    indices:
        - name: media
          class: App\Entity\MediaTranslation
And here's the index settings file (created with `search:settings:backup`) ```json { "minWordSizefor1Typo": 4, "minWordSizefor2Typos": 8, "hitsPerPage": 20, "maxValuesPerFacet": 100, "version": 2, "searchableAttributes": [ "unordered(media.id)", "unordered(title)", "unordered(tags)", "unordered(description)", "unordered(features)", "unordered(goals)", "unordered(more)" ], "numericAttributesToIndex": null, "attributesToRetrieve": null, "unretrievableAttributes": null, "optionalWords": null, "attributesForFaceting": [ "locale", "media.type", "status", "filterOnly(tags)", "filterOnly(title)" ], "attributesToSnippet": null, "attributesToHighlight": null, "paginationLimitedTo": 1000, "attributeForDistinct": null, "exactOnSingleWordQuery": "attribute", "ranking": [ "typo", "geo", "words", "filters", "proximity", "attribute", "exact", "custom" ], "customRanking": null, "separatorsToIndex": "", "removeWordsIfNoResults": "none", "queryType": "prefixLast", "highlightPreTag": "", "highlightPostTag": "<\/em>", "snippetEllipsisText": "", "alternativesAsExact": [ "ignorePlurals", "singleWordSynonym" ], "sortFacetValuesBy": "count", "renderingContent": { "facetOrdering": { "facets": { "order": [ "locale", "media.type", "status" ] }, "values": { "locale": { "sortRemainingBy": "alpha" }, "media.type": { "sortRemainingBy": "alpha" }, "status": { "sortRemainingBy": "alpha" } } } } } ```

I tried changing the batchSize but the issue remained.
I used to have a index_if in there, but removed it and the issue remained.

When running the search:import command and regularly refreshing the index on the Algolia dashboard, the "No. records" evolves like so (that's only an example, values change if I re-run this on a clear index):

  • 500
  • 1000
  • 1,500
  • 2,000
  • 2,253
  • 2,525
  • 3,025
  • (...)

As you can see, thinks looks OK at first, but then get a bit crazy around the 2000/2500 mark.

Steps To Reproduce

Unfortunately this is hard to reproduce, because I can't pinpoint the origin of the issue (and the randomness makes it even stranger) 🙁

I tried looking at the Symfony logs to see if some error appeared there, but found nothing.

What could prevent records from appearing in my index?

quentint commented 1 year ago

Digging a bit more, I can confirm the issue come from this repo (and not algolia/algoliasearch-client-php), because I wrote this simple command that uses it directly and works as intended:

<?php
// src/Command/MediaIndexCommand.php

namespace App\Command;

use Algolia\AlgoliaSearch\SearchClient;
use App\Entity\MediaTranslation;
use App\Serializer\Normalizer\MediaTranslationNormalizer;
use Doctrine\ORM\EntityManagerInterface;
use Symfony\Component\Console\Attribute\AsCommand;
use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Output\OutputInterface;
use Symfony\Component\Console\Style\SymfonyStyle;

#[AsCommand(
    name: 'app:media:index',
    description: 'Index media translations',
)]
class MediaIndexCommand extends Command
{

    public function __construct(private readonly EntityManagerInterface $manager, private readonly MediaTranslationNormalizer $normalizer)
    {
        parent::__construct();
    }

    protected function execute(InputInterface $input, OutputInterface $output): int
    {
        $io = new SymfonyStyle($input, $output);

        $client = SearchClient::create('...', '...');
        $index = $client->initIndex('quentin_media');
        $index->clearObjects();

        $translations = $this->manager->getRepository(MediaTranslation::class)->findAll();
        $chunks = array_chunk($translations, 500);

        foreach ($chunks as $chunkIndex => $chunk) {
            $io->info("Chunk $chunkIndex");
            $objects = array_map(fn(MediaTranslation $translation) => [...$this->normalizer->normalize($translation, 'searchableArray'), 'objectID' => $translation->getId()], $chunk);
            $index->saveObjects($objects);
        }

        return Command::SUCCESS;
    }
}

image

I hope this helps.

quentint commented 1 year ago

Still investigating... Looking at the logs generated with Algolia\AlgoliaSearch\Log\DebugLogger::enable(); I don't see anything special.

Also, I don't understand how/where the bundle does anything different from my own command (apart from supporting more cases) 🤔