algolia / mongo-connector

WARNING : This connector is deprecated, please use an API Client
https://www.algolia.com/doc/
Apache License 2.0
22 stars 11 forks source link

fixing updates and skipping unnecessary updates #13

Open benjamine opened 9 years ago

benjamine commented 9 years ago

hi guys, this might be a big change to be merged lightly, but I'm sending it in case you guys find this, or part of this, useful.

1. skip updates based on filters

I added the update_can_be_ignored function that will use the json attributes_filter to filter the mongo update_spec ($set and $unset), if no field passed thru the filter, that means the update won't have any impact in the algolia index, so the update is skipped. we are using this to avoid a lot of unnecessary operations.

2. support updates when a postproc script is used

we realized this functionality is sort of broken right now, current behavior when an update arrive is:

  1. read the doc from algolia
  2. apply the mongo update_spec to that doc, see here
  3. apply remap and filter
  4. postproc
  5. send to algolia as partialUpdate

the problem here is step 2, that apply_update is trying to apply an update that is for the original mongo document, to the algolia doc, which if you use a postproc script, will have a different structure. Now, depending on how desctructive your postproc is, there's no way to recreate the doc from the algolia doc (the postproc result), for example, we index products, and we delete from the doc variants out of stock, but when an update comes saying stock is back, we need to readd it, that is impossible without grabbing the original mongo document.

If you check the Elastic Search doc manager, you'll notice they solved this problem by including the source doc (untransformed) as a child property "source", but we didn't want to do that for 2 reasons:

so we decided the cheapest and safest is to read the doc from mongo instead (unfortunately that means I had to modify the connector class to share the mongo connection with the doc manager).

the result is now, when an update arrives there are 4 possible results:

a. update skipped

if the update has $set and $unset, and the fields updated are all filtered out, update is skipped cost: zero

b. doc replace

if the update is a doc replace, the update_spec is the full doc (no $set or $unset), so that is filter+remap+postproc, and sent to algolia cost: 1 algolia operation

c. reprocessing original doc

if there is a postproc script, and a partial update, the doc is read from mongo (using the mongo client obtained from the connector), and the obtained doc is filter+remap+postproc, and sent to algolia cost: 1 mongo read + 1 algolia operation

d. sending a partial update

if there is no postproc, then it is possible to convert the mongo update_spec into an algolia partial update, and that is sent as an algolia partial update. cost: 1 algolia operation