algolia / scout-extended

Scout Extended: The Full Power of Algolia in Laravel
https://www.algolia.com/doc/framework-integration/laravel/getting-started/introduction-to-scout-extended/
MIT License
399 stars 84 forks source link

[Bug] scout:reimport moves index too soon, multiple race conditions #171

Open nathanblogs opened 5 years ago

nathanblogs commented 5 years ago

If you are using a queue to reimport into algolia that isn't FIFO eg AWS SQS, there is a very high chance of hitting a race condition where you move the temp index to the real index without pushing all the data into the new index. I've seen examples of 5-10% of the records missing.

Also depending on how long the reimport takes and how frequently you are updating records it seems like there is a high chance the reimport will import stale data by using this approach timeline eg: 1) start reimport 2) records gets pushed into new temp index 3) record gets updated and pushed into old index 4) temp index gets moved to main index.

You now have stale data in the newly moved index

Unsure is this is related but you seem to reference config('scout.synchronous', false) in the code but I can't seem to figure out where or how that is defined ?

nunomaduro commented 5 years ago

Hi @nathanblogs. Thanks for reporting this issue.

A solution for your issue would be setting the scout.synchronous to true. Using:

config(['scout.synchronous' => true]);
// Run your command
config(['scout.synchronous' => false]);

If it works, do you mind of creating a pull request adding this as optional argument on the reimport command?

nathanblogs commented 5 years ago

where does scout.synchronous come from it doesn't seem to be declared anywhere ?

nunomaduro commented 5 years ago

It's not declared, that's why the configuration option that is false by default.

When is true, we actually wait for the saveObjects to finished before moving to the next one: https://github.com/algolia/scout-extended/blob/f3475be9279dce705d438b74f97d91195f5e4409/src/Jobs/UpdateJob.php#L139.

Give it a try, and tell me if it solves your problem.

nathanblogs commented 5 years ago

I'm not sure how UpdateJob is relevent ?

I can't see how that would help, isn't it calling the existing scout function makeAllSearchable. https://github.com/algolia/scout-extended/blob/9aa937089343c05460252b9a438c670b7beebabb/src/Console/Commands/ReImportCommand.php#L81

This intern calls searchable in batches which dispatches jobs.

After that there is a moveindex. https://github.com/algolia/scout-extended/blob/9aa937089343c05460252b9a438c670b7beebabb/src/Console/Commands/ReImportCommand.php#L94 This moveIndex has no ability to tell if all the dispatched searchable jobs have finished, which I believe is the cause of the issue.

I think the whole reimport needs a rethink to be a sync function instead of a temporary index.

nunomaduro commented 5 years ago

Problem finally understood.

I will try to investigate this during the next couple days.

TomKlotzPro commented 5 years ago

I'm taking this issues @nunomaduro

torrancemiller commented 5 years ago

I got it to work by turning-off the queue entirely.

Example: config(['scout.queue' => false]);

\Illuminate\Support\Facades\Artisan::call('scout:reimport "App\\\\Models\\\\Thread"');

config(['scout.queue' => true]);

torrancemiller commented 5 years ago

Hopefully your reimports are as rare as mine. I had to make a structural change to everything in an index and added the above to a migration up() method so that I don't have to remember to SSH in to prod when a batch of changes get deployed.

You would also be able to do this through SSH via php artisan tinker of course

nunomaduro commented 5 years ago

@torrancemiller Maybe we can work in a pull request together on this no?

nathanblogs commented 5 years ago

@torrancemiller that solution doesn't resolve the race condition that can occur unless you put your system into read only model or maintenance.

Basically what can happen is: 1) you start the scout:reimport 2) model id 1 gets push to algolia new temp index. 3) model id 1 gets update in your system 4) scout:reimport finish moving the temp index over the real index

Which is a race condition you now have old / stale data in algolia for model id 1, this is particularly a problem if you have a large dataset or a high frequency of updates on searchable models.

torrancemiller commented 5 years ago

@nathanblogs

I do not believe there is a technical solution to that sort of thing. I would recommend doing this during a service maintenance window, and close to never on production servers. However, if you did have to do this in production, I recommend running the php artisan down command to block incoming requests to your server while the reimport runs.

If that is not an option, you will need to get fancy and write a SQL query to retrieve all models updated after you decided to reimport and call the ->searchable() method on that result set and you should be peachy with them.

torrancemiller commented 5 years ago

@nunomaduro well, I suppose there may be much more elegant fixes if they were to be incorporated into the package. I wish I had more free time to see if there may be a surgical fix here.

nathanblogs commented 5 years ago

@torrancemiller of course there are technical solutions to this.

I believe the easiest solution would be to update UpdateJob function to add a sha1/md5 hash to the array returned by toSearchableArray().

Then we could update ImportCommand to do a sync instead of a full import:

nathanblogs commented 5 years ago

Any thoughts ?

jimbojsb commented 4 years ago

@nunomaduro any reason not to just force SCOUT_QUEUE to be ignored when scout is invoked from this context? I don't see any other way this could possibly work.

nunomaduro commented 4 years ago

You mean in the command itself?

tomcoonen commented 4 years ago

I have my scout queue setting set like this: 'queue' => [ 'queue' => 'search-index', ],

Specifying the queue like this will not wait for the temp index to be ready, but moves an empty index to 'production'.

Anything I can do to get this fixed?

nunomaduro commented 4 years ago

Well, maybe you/we can work in a pull request to get this fixed. Just to be clear, the problem is:

Only when the user is using queues:

1. we copy settings to temporary index
2. we queue jobs of indexation
3. we move the temporary index to production without waiting for the queued jobs to be complete.
tomcoonen commented 4 years ago

I agree on the problem, gonna see if I can fix it in the upcoming days :)

logan-jobzmall commented 3 years ago

Any news on this?

Edit: I went ahead and just created a new command that sets queue to false, then sets it back to my configured settings.


public function handle()
    {
        \Config::set('scout.queue', false);
        $this->call('scout:reimport');
        \Config::set('scout.queue',
            [
                "queue" => "search",
                "connection" => "redis"
            ]);
    }