algolia / algoliasearch-wordpress

❌🗑🙅‍♂️ Algolia Search plugin for WordPress is no longer supported. Please use our API client guide instead
https://www.algolia.com/doc/integration/wordpress/getting-started/quick-start/
GNU General Public License v2.0
358 stars 114 forks source link

possible bug in how Algolia_Index->sync() behaves when post_type is "post" and should_index() is false #804

Open Kevin-Hamilton opened 6 years ago

Kevin-Hamilton commented 6 years ago

What did you expect to happen?

deleteObject API not called if item has already been deleted from Algolia Index

What happened instead?

deleteObject API called more frequently than necessary

How can we reproduce this behavior?

Theoretically, making updates to a draft or password protected post may show this behavior. My test case that led me to discover the issue is far more complicated than I think you would want here, and I don't have an easy way to build a simple test case at the moment, sorry. I'm hoping my code analysis below is sufficient for you to verify if this is a bug....

Can you provide a link to a page which shows this issue?

No.

Technical info

Analysis

I think there may be a bug in how Algolia_Index->sync() behaves when post_type is "post" and should_index() is false.

Looking at https://github.com/algolia/algoliasearch-wordpress/blob/master/includes/indices/class-algolia-index.php#L167 this call to delete_item leads to https://github.com/algolia/algoliasearch-wordpress/blob/master/includes/indices/class-algolia-posts-index.php#L328 which uses get_post_records_count() when building the list of records to delete. However, there is not a subsequent call to set_post_records_count() afterwords. So a subsequent call to sync() for the same item will result in another deleteObject API call even though the item was already deleted from the Algolia index.

I'm not sure if there are some edge cases around this behavior that makes it better to be safe than sorry here, but since Algolia charges per indexing operation I thought this was worth mentioning so you could take a look. I suspect that the delete_item() function may have been developed in isolation from the should_index() code and so the developer working on delete_item() probably thought (and rightly so) that it doesn't make sense to set a post_meta field on a post that is about to be deleted. But because should_index() sometimes will be false for items which are kept around in the database, it seems to me that it can result in repeated calls to delete an item from Algolia Index which was already deleted.

In my particular use case, I am using the algoliasearch-woocommerce plugin, which has catalog visibility options which controls if a product should be indexed: https://github.com/algolia/algoliasearch-woocommerce/blob/master/includes/indexing.php#L137 and I'm seeing deleteObject calls going into Algolia when a hidden item is updated.