hi guys, this might be a big change to be merged lightly, but I'm sending it in case you guys find this, or part of this, useful.
1. skip updates based on filters
I added the update_can_be_ignored function that will use the json attributes_filter to filter the mongo update_spec ($set and $unset), if no field passed thru the filter, that means the update won't have any impact in the algolia index, so the update is skipped.
we are using this to avoid a lot of unnecessary operations.
2. support updates when a postproc script is used
we realized this functionality is sort of broken right now, current behavior when an update arrive is:
the problem here is step 2, that apply_update is trying to apply an update that is for the original mongo document, to the algolia doc, which if you use a postproc script, will have a different structure.
Now, depending on how desctructive your postproc is, there's no way to recreate the doc from the algolia doc (the postproc result), for example, we index products, and we delete from the doc variants out of stock, but when an update comes saying stock is back, we need to readd it, that is impossible without grabbing the original mongo document.
If you check the Elastic Search doc manager, you'll notice they solved this problem by including the source doc (untransformed) as a child property "source", but we didn't want to do that for 2 reasons:
so we decided the cheapest and safest is to read the doc from mongo instead (unfortunately that means I had to modify the connector class to share the mongo connection with the doc manager).
the result is now, when an update arrives there are 4 possible results:
a. update skipped
if the update has $set and $unset, and the fields updated are all filtered out, update is skipped
cost: zero
b. doc replace
if the update is a doc replace, the update_spec is the full doc (no $set or $unset), so that is filter+remap+postproc, and sent to algolia
cost: 1 algolia operation
c. reprocessing original doc
if there is a postproc script, and a partial update, the doc is read from mongo (using the mongo client obtained from the connector), and the obtained doc is filter+remap+postproc, and sent to algolia
cost: 1 mongo read + 1 algolia operation
d. sending a partial update
if there is no postproc, then it is possible to convert the mongo update_spec into an algolia partial update, and that is sent as an algolia partial update.
cost: 1 algolia operation
hi guys, this might be a big change to be merged lightly, but I'm sending it in case you guys find this, or part of this, useful.
1. skip updates based on filters
I added the
update_can_be_ignored
function that will use the json attributes_filter to filter the mongo update_spec ($set and $unset), if no field passed thru the filter, that means the update won't have any impact in the algolia index, so the update is skipped. we are using this to avoid a lot of unnecessary operations.2. support updates when a postproc script is used
we realized this functionality is sort of broken right now, current behavior when an update arrive is:
the problem here is step 2, that apply_update is trying to apply an update that is for the original mongo document, to the algolia doc, which if you use a postproc script, will have a different structure. Now, depending on how desctructive your postproc is, there's no way to recreate the doc from the algolia doc (the postproc result), for example, we index products, and we delete from the doc variants out of stock, but when an update comes saying stock is back, we need to readd it, that is impossible without grabbing the original mongo document.
If you check the Elastic Search doc manager, you'll notice they solved this problem by including the source doc (untransformed) as a child property
"source"
, but we didn't want to do that for 2 reasons:so we decided the cheapest and safest is to read the doc from mongo instead (unfortunately that means I had to modify the connector class to share the mongo connection with the doc manager).
the result is now, when an update arrives there are 4 possible results:
a. update skipped
if the update has $set and $unset, and the fields updated are all filtered out, update is skipped cost: zero
b. doc replace
if the update is a doc replace, the update_spec is the full doc (no $set or $unset), so that is filter+remap+postproc, and sent to algolia cost: 1 algolia operation
c. reprocessing original doc
if there is a postproc script, and a partial update, the doc is read from mongo (using the mongo client obtained from the connector), and the obtained doc is filter+remap+postproc, and sent to algolia cost: 1 mongo read + 1 algolia operation
d. sending a partial update
if there is no postproc, then it is possible to convert the mongo update_spec into an algolia partial update, and that is sent as an algolia partial update. cost: 1 algolia operation