algolia / algoliasearch-client-php

⚡️ A fully-featured and blazing-fast PHP API client to interact with Algolia.
https://www.algolia.com/doc/api-client/php/getting-started/
MIT License
670 stars 116 forks source link

Add the possibility to use a stream as a request #536

Closed greg0ire closed 5 years ago

greg0ire commented 5 years ago

I would like to use the replaceRules method of SearchIndex with a stream because I have 500 000 query rules I want to atomically replace.

I see that it is currently possible to use a stream to create a Algolia\AlgoliaSearch\Http\Psr7\Request: https://github.com/algolia/algoliasearch-client-php/blob/1a058bb0c06542e677505032ee3a1cb42135a8bc/src/Http/Psr7/Request.php#L66

BUT, it's not usable from the ApiWrapper implementation: https://github.com/algolia/algoliasearch-client-php/blob/1a058bb0c06542e677505032ee3a1cb42135a8bc/src/RetryStrategy/ApiWrapper.php#L118

Would there be a way to make the API stream-friendly?

greg0ire commented 5 years ago

Sounds hard to solve though: if $data is a stream that produces ["requests" => ["action" => "addObject" => "body" => ["some_json": 42]]], we want to insert the request options right in the middle of that, which seems impossible to me… maybe the best would be to be more friendly to Iterators instead?

Meaning we would stop doing this:

https://github.com/algolia/algoliasearch-client-php/blob/1a058bb0c06542e677505032ee3a1cb42135a8bc/src/SearchIndex.php#L499-L505 , skip this when dealing with an iterator: https://github.com/algolia/algoliasearch-client-php/blob/1a058bb0c06542e677505032ee3a1cb42135a8bc/src/SearchIndex.php#L511

And then instead of array_merge, if $data is an \Iterator, use https://www.php.net/manual/en/class.appenditerator.php like this:

<?php
$body = new \AppendIterator();
$body->append($data);
$body->append($requestOptions->getBody());

Also, we would use a special string when logging the body if dealing with an iterator. (something like "stream probably too big to be logged") or maybe find a way to rewind and truncate the stream WDYT?

greg0ire commented 5 years ago

Ah there is one missing part in this method: https://github.com/algolia/algoliasearch-client-php/blob/1a058bb0c06542e677505032ee3a1cb42135a8bc/src/RetryStrategy/ApiWrapper.php#L218-L237

We would have to check if $body is an iterator, and if it is, build yet another iterator, but one that would be a string[]:

$bodyAsString = new \AppendIterator();
$bodyAsString->append(new \ArrayIterator(['{']));
$bodyAsString->append((
    function () use ($body): \Generator {
        while ($body->valid()) {
            $line = json_encode($body->current());
            $body->next();
            if ($body->valid()) {
                $line .= ',';
            }

            yield $line;
        }
    }
)());
nunomaduro commented 5 years ago

Hey @greg0ire,

Thanks for considering contributing to our PHP Client.

Have you tested that solution locally?

greg0ire commented 5 years ago

Not yet, no, I just tested the last snippet but didn't do an end-to-end test. I just wanted to know if it could seem worth contributing, and be realistic to you first.

nunomaduro commented 5 years ago

We have found a solution for @greg0ire talking via Slack. He wanted to send a batch of 500 000 query rules to production. Here is the solution we found the best for him, using a temporary index to make this operation atomic:

$index = $client->initIndex('temp_production');

$index->saveObject(['objectID' => 1]).wait();

// Consider batch here to avoid memory issues.
$batchOfRules = [[]]; // Batch of rules...
$responses = [];
foreach ($batchOfRules as $rules) {
    $responses[] = $index->saveRules($rules);
}
(new MultiResponse($responses))->wait();

$client->copyIndex('temp_production', 'production', [
    'scope' => 'rules'
])->wait();