algolia / scout-extended

Scout Extended: The Full Power of Algolia in Laravel
https://www.algolia.com/doc/framework-integration/laravel/getting-started/introduction-to-scout-extended/
MIT License
393 stars 85 forks source link

deleteBy throwing Algolia's limit rate on bulk delete #311

Closed eze2411 closed 4 months ago

eze2411 commented 1 year ago

Description

Hi. We're working on a marketplace application, that currently has over 200k users and +300k products. We're experiencing the following issue while running a cron job that checks and deletes (or not) Searchable objects from our database:

Algolia\AlgoliaSearch\Exceptions\BadRequestException^ {#4246
  -request: null
  #message: "Rate limit reached for expensive delete-by-query job count (observed: 1; allowed: 1). To learn more about these limits, please see https://support.algolia.com/hc/en-us/articles/4406975251089-Is-there-a-rate-limit-."
  #code: 429
  #file: "./vendor/algolia/algoliasearch-client-php/src/RetryStrategy/ApiWrapper.php"
}

It seems to be that the problem occurs in src/Jobs/DeleteJob.php, since it does use deleteBy at line 55. After doing a little research at Algolia's limit rate error, we found that it could be solved by using deleteObject instead of deleteBy.

We were able to make it work by using the following code at DeleteJob's handle method:

    public function handle(SearchClient $client): void
    {
        if ($this->searchables->isEmpty()) {
            return;
        }

        $index = $client->initIndex($this->searchables->first()->searchableAs());

        $searchable = $this->searchables->map(function ($searchable) {
            return ObjectIdEncrypter::encrypt($searchable);
        })->toArray();

        $objectId = explode('::', $searchable[0])[1];
        $result = $index->deleteObject($objectId);

        if (config('scout.synchronous', false)) {
            $result->wait();
        }
    }

Please, let us know what you think about this or if you have ever heard before about Algolia's limit rate policy and how you managed it.

Thanks in advance, Regards.

Steps To Reproduce

DevinCodes commented 1 year ago

Hi @eze2411 ! Thank you for the clear explanation on this.

To give some context on why we use deleteBy rather than deleteObject: if a model generates multiple records, the approach of deleting by objectID won't work. This approach works well when there's a 1:1 relationship between models and records, but as soon as there are multiple records the approach needs to change to delete by the shared attribute (tagFilters in this case).

I'd like to keep this issue open to see if more people run into this issue, and we can rethink or reconsider if there's enough demand to change this behavior. For now, though, the best way around it would be to keep the code you updated to delete your records 🙂

Thank you again, please let me know what you think or if you have any questions!

eze2411 commented 1 year ago

Hi @DevinCodes ! Thanks for your quick reply. Great to hear about why you use deleteBy rather than deleteObject. We already took note of that.

The next question that came up to us while digging deeper on this is: if the relationship between models and records gets higher than 1:1, could we use then deleteObjects so that it gets to delete all of the records involved?

We're currently doing this, that seems to be working well:

    public function handle(SearchClient $client): void
    {
        if ($this->searchables->isEmpty()) {
            return;
        }

        $index = $client->initIndex($this->searchables->first()->searchableAs());

        $searchables = $this->searchables->map(function ($searchable) {
            return ObjectIdEncrypter::encrypt($searchable);
        })->toArray();

        $result = $index->deleteObjects($searchables);

        if (config('scout.synchronous', false)) {
            $result->wait();
        }
    }

Let us know what you think of it, if it could solve the issue o is it still necessary to take a deeper look onto this.

Thanks in advance, Regards.

DiOps commented 1 year ago

Thanks @eze2411 for the code example. This really helped a lot.

@DevinCodes I still don't fully understand why the DeleteJob is dispatched synchronously? If you don't patch the deleteBy query with the example above, it might easily be that you run into Algolia's rate limit which in turn causes an exception to be thrown in your application. This then interrupts the current request flow and has a really bad impact on UX. Pushing the job onto the queue would at least prevent the last bit.

DevinCodes commented 1 year ago

I believe the delete jobs are sent synchronously because they rely on the model in order to determine what records should be deleted. If we'd make this async, the record is removed from the DB by the time the job runs and therefore the job won't work as expected.

DiOps commented 1 year ago

True, yet for this job only the Algolia IDs are needed which could be serialized into the job data itself. Would that be possible? The issue I mentioned before of running into rate limit exceptions which would then cause the application request to be interrupted is quite severe and unknown upfront.

DevinCodes commented 11 months ago

This would require quite some work, for which we don't have the proper resources at the time to be able do this. Of course we'd be happy to review a PR with this change, but I don't see a way in which we can dedicate time to change this anytime soon to be honest, on our end sorry!