WebDevStudios / wp-search-with-algolia

Improve search on your site. Autocomplete is included, along with full control over look, feel and relevance.
https://wordpress.org/plugins/wp-search-with-algolia/
146 stars 55 forks source link

Algolia_Search::pre_get_posts ignores $args used in manual $query #350

Closed theodejager closed 1 year ago

theodejager commented 1 year ago

Describe the bug I have a search page setup where I create multiple $wp_queries to search for content and split them up into categories based on post type and post status.

A gist of the code looks something like this:

function prefix_get_news( $limit = -1, $search = false ) {
    $args = array(
        'post_type'           => 'news',
        'post_status'         => 'archive', // note that this is a custom registered post_status 
        'fields'              => 'ids',
        'orderby'             => 'publish_date',
        'order'               => 'ASC',
        'posts_per_page'      => $limit,
    );
    if ( $search ) {
        $args['s'] = get_search_query();
    }
    $query = new WP_Query( $args );
    if ( isset( $query->posts ) && ! empty( $query->posts ) ) {
        return $query->posts;
    }
    return false;
}

function prefix_get_pages( $limit = -1, $search = false ) {
    $args = array(
        'post_type'          => 'page',
        'post_status'        => 'publish',
        'fields'             => 'ids',
        'posts_per_page'     => $limit,
    );
    if ( $search ) {
        $args['s'] = get_search_query();
    }
    $query = new WP_Query( $args );
    if ( isset( $query->posts ) && ! empty( $query->posts ) ) {
        return $query->posts;
    }
    return false;
}

$news  = prefix_get_news( 10, true );
$pages = prefix_get_pages( 10, true );

if ( $news ) {
    foreach ( $news as $item ) {
        // render the item
    }
}

if ( $pages ) {
    foreach ( $pages as $item ) {
        // render the item
    }
}

Firstly after looking at the Algolia_Search class and googling a bit, I realised that these queries do not run with Algolia enabled by default until I add the filter (not sure if this is the correct approach):


function custom_algolia_should_filter_query( $should_filter, $query ) {
    if ( ! is_admin() && ! empty( get_search_query() ) ) {
        return true;
    }
    return $should_filter;
}
add_filter( 'algolia_should_filter_query', 'custom_algolia_should_filter_query', 10, 2 );

After adding the filter, the results that return are ignoring the post type and post status that I am giving each manual $query There is also an undefined php notice: "Warning: Undefined array key "s" in /var/www/html/public_html/content/plugins/wp-search-with-algolia/includes/class-algolia-search.php on line 148
"

The returned results are basically repetitions of each other, because the 'post_type' and 'post_status' arguments are being reset at the end of the Algolia_Search::pre_get_posts function.

Not sure if this is a bug or I misunderstood the available filters. Any help would be greatly appreciated.

To Reproduce Steps to reproduce the behavior:

  1. Index two different post types in the plugin settings.
  2. Create a page with 2 or more custom $wp_queries of different post types that include the 's' parameter
  3. Add the filter for algolia_should_filter_query, return true if a search parameter is present and not in /wp-admin
  4. See that the results are repeated and ignore the post_type and other $args given to $query = new WP_Query( $args );.

Expected behavior The plugin should honor the $args given to a $query

tw2113 commented 1 year ago

Potentially a very quick win with this thought, but I wonder if your filters are simply running after the callback for our plugin has run, especially with default priorities. So I wonder giving yours a higher priority, would get it set first, and then passed to the pre_get_posts callback that we have.

For what it's worth, at least with the "backend search" that isn't InstantSearch based, we're taking the search query, sending it to Algolia to find what results Algolia determines are best fits, and then returning an array of IDs to use. Basically saying "these are the object IDs that we think match, do a post__in query for these specific items, and there's your Algolia powered results"

theodejager commented 1 year ago

@tw2113 thanks for the reply! Unfortunately, changing the priority has no effect, and the only reason I added the filter is to enable the "backend search" for those queries.

What does fix my issues are the following changes:

For the undefined notice adding a null coalescing operator as a fallback for an empty query['s']:

        try {
            $results = $this->index->search( $query->query['s'] ?? null, $params, $order_by, $order );
        } catch ( AlgoliaException $exception ) {
            error_log( $exception->getMessage() ); // phpcs:ignore -- Legacy.

            return;
        }

For the post_types and post_status, first checking if the query already has those parameters, otherwise adding the defaults:

        if ( ! isset( $query->query_vars['post_type'] ) ) {
            $query->set( 'post_type', $post_types );
        }
        // @todo: This actually still excludes trash and auto-drafts.
        if ( ! isset( $query->query_vars['post_status'] ) ) {
            $query->set( 'post_status', 'any' );
        }
tw2113 commented 1 year ago

Can you open a PR for your changes so that we can review everything and further discuss things?

tw2113 commented 1 year ago

To be certain in trying to recreate, the main things we'd need to do is copy/paste your 2 functions and usages somewhere, and then also set Algolia Search page settings to "Use Algolia with the native WordPress search template" ?

theodejager commented 1 year ago

Hi, I created a pull request: #354

To be certain in trying to recreate, the main things we'd need to do is copy/paste your 2 functions and usages somewhere, and then also set Algolia Search page settings to "Use Algolia with the native WordPress search template" ?

Yes, although I did add a filter for "algolia_should_filter_query" taking inspiration from how the ElasticPress plugin does it with a "ep_integrate" query var. That way I have more granular control over where I want to use it. I realized that the undefined notice was happening because my previous filter was too permissive and was running during other wordpress queries.

function custom_algolia_should_filter_query( $should_filter, $query ) {
    if ( isset( $query->query_vars['algolia_integrate'] ) && (bool) $query->query_vars['algolia_integrate'] === true ) {
        return true;
    }
    return $should_filter;
}
add_filter( 'algolia_should_filter_query', 'custom_algolia_should_filter_query', 15, 2 );

I do have one uncertainty and this is that the index used is always "searchable_posts" and not for example "posts_event" or "posts_pages". I am guessing searching a bigger index is more costly in terms of performance / and or number of requests to Algolia?

In my case, I end up with several duplicated queries which in theory could be optimized if we are only searching the same index every time. So maybe being able to define the index is a possible approach, or caching the results (not sure if this is done already) if a query is duplicated?

tw2113 commented 1 year ago

Reading and pondering the PR.

One thing to keep in mind, is that this is the only part that actually reaches out to Algolia: https://github.com/WebDevStudios/wp-search-with-algolia/blob/main/includes/class-algolia-search.php#L116-L153 Everything after that try/catch is modifying the query to the database that will still be made by WordPress. It'll just have the identified post ID values to specifically fetch. I do feel like that filter could use the $query object passed into the function, as having that available would take care of the need for the hits per page edits made in the PR.

I also feel like the should_filter_query() check at the very top could handle the added isset check for the $query['s'] bit already, since that method has its own filter.

That leaves the last parts starting on line 189 in your version. Part of me wants to say "throw a filter on things at the last moment" but I believe filters are already being used to set some of these parameters already, and a filter here would be ANOTHER one to set to accomplish the same thing. That said, I'm also not sure why a lower priority on pre_get_posts and whatnot wouldn't put you after things enough to set those specifically. However, this is still all being done after Algolia has even been interacted with, so WP is being told pretty "bluntly" what to fetch via $query->set( 'post__in', $post_ids );

I'm not outright closing, but do want others involved with the plugin development to chime in.

tw2113 commented 1 year ago

Still contemplating this one.

Can you give an overarching description of what you're ultimately trying to accomplish here? I get already the "use WP_Query to combine multiple requests" and whatnot, but more of what type of feature you're trying to add to your site, and how much of it needs to be coming from Algolia's indices.

We're trying to make sure that we're not making changes to the plugin that meets the needs of only one to few people, while potentially affecting our overall user-base.

Thinking back to earlier comments from me, since you're not using your own pre_get_posts callbacks that I can tell, me saying to move things to a lower priority doesn't make sense. My apologies for that part.

That also brings up the thoughts of what if you were, that way you could potentially override the constructed query variables on your own, without having to have the core code touched. I do also know and recognize that we do still offer the algolia_should_filter_query filter, and that allows the ability to override whether or not we're in the WP admin, it is a search query, and it is the main query. This looks like a filter that was added before WebDevStudios forked the codebase.

All in all, it definitely feels like a case of trying to use Algolia's search and indexes as a whole, outside of what I'll call the "Standard WordPress search".

Almost feels like you're trying to do multiple searches based on one query, all through custom WP_Query calls, leaving me wondering if there's a better, more native way to do so (I may be mixing backend and Instantsearch in my head with this one).

theodejager commented 1 year ago

Thanks for taking the time to consider the issue. I've added a screenshot to serve as a visual example of my application.

Here's a brief description:

Each post type is fetched using its own $wp_query, with code similar to the code gist shared in my first comment.

Reasons why I like this setup: 1) I find splitting it by post type gives a clear overview, just like the instant search does. 2) I can disable the plugin/algolia and the search will still work consistently. 3) I can use algolia in other places such as tips/recommendations functions, with consistent $args and results using 'algolia_integrate"

I am aware of the fact that this construction means:

I also understand that I could change/rearrange my code and manually filter the results (or something of that nature), but I think the point of my PR is that the plugin should not force certain defaults in places where specific $args might have already been set by some other code. search

tw2113 commented 1 year ago

I'm definitely hearing you and for the majority of everything, understanding what it is you're trying to do. However, it's also feeling very much like making a lot of changes on our end for a very specific usecase, which is largely already guarded against happening 99% of the time.

I am most definitely aware that the algolia_should_filter_query filter exists, and it was one that was added by Algolia before WebDevStudios' fork, so I don't have access to any of the decision making for the filter or what intentions they were envisioning for its use case.

That said, it's available, and we have at least some people making use of it. We still need to keep in mind changes that could potentially affect others, beyond just one use case though. This brings up questions of approach on how to satisfy everyone.

I'm sitting here thinking and looking at the codebase and wondering exactly how easy it'd be to get some code set up to make use of the bundled Algolia Search PHP client which is used in just the first part of our pre_get_posts callback. If my thought pattern is correct, you'd be able to use that to fetch Algolia-provided IDs and the like for what it thinks are most relevant with a given search, and instead of setting your own custom search parameter in WP_Query, you'd already have the IDs to fetch.

I believe that'd still allow you to have usage of Algolia overall with the indexing aspect, and then also allow you to do queries for content, without having to use the override backend search setting in the way being tried, especially around this detail with the default should override value:

$should_filter = ! $query->is_admin && $query->is_search() && $query->is_main_query();

I do believe we have public enough functions and methods to collect all the parts needed as well.

This isn't me saying outright no still, but I'm definitely leaning no on the PR.

theodejager commented 1 year ago

That said, it's available, and we have at least some people making use of it. We still need to keep in mind changes that could potentially affect others, beyond just one use case though. This brings up questions of approach on how to satisfy everyone.

I understand and agree completely, modifying the current functionality could potentially have impact on other existing applications, so a potential merge needs to consider all ramifications.

Even if my PR is merged, I'd still run into the issue of duplicate requests to Algolia per $query, which is something I would like to avoid. The PR is just a possible way that occurred to me (without immersing myself in the entire code base) and seems consistent with how WordPress and some other search plugins like ElasticPress do it. I was actually hoping for a way to achieve my needs without going through unnecessary steps, and without too much alteration of my own code should I decide to use or stop using the plugin.

I do believe we have public enough functions and methods to collect all the parts needed as well.

I'm definitely open to suggestions on how I could for example work with this bit:

        try {
            $results = $this->index->search( $query->query['s'], $params, $order_by, $order );
        } catch ( AlgoliaException $exception ) {
            error_log( $exception->getMessage() ); // phpcs:ignore -- Legacy.
            return;
        }

in isolation, for example by creating my own instance of that class and defining the index. I do feel like such a custom integration with a "core" part of the plugin feels a little sketchy and more prone to breaking changes on updates, but I could be wrong.

Any advice or code examples on this possible avenue are welcome.

tw2113 commented 1 year ago

This is very much a copy paste of a number of details, but has been amended to work outside the context of the plugin's many classes.

https://gist.github.com/tw2113/92510bd97ca7801f9121743e02d7b6bb

I have left notes in the gist for the various parts, but feel free to ask any questions you have about that. Below is a screenshot of the returned results for a very basic query that I had available in my test index. You'll be most interested in the hits array that is returned, as that's going to have your result information, including the found post_id that you can use to query for the very specific items.

Screen Shot 2023-09-13 at 12 31 44 PM

Hopefully this helps out a lot on all fronts, but we're here to further answer questions or brainstorm a bit on things.

theodejager commented 1 year ago

thanks for the example! I think I can definitely work with this.

I'll share details of my implementation for future reference when I get around to it.

tw2113 commented 1 year ago

Sounds good. Closing the issue overall here, but feel free to comment if or when needed.