getgrav / grav-premium-issues

Official Grav Premium Issues repository to report problems or ask questions regarding the Premium products offered.
https://getgrav.org/premium
7 stars 2 forks source link

[algolia-pro] Cannot Get Separate EN and FR Search Results Despite Both Being Indexed #350

Closed thekenshow closed 1 year ago

thekenshow commented 1 year ago

I have a Grav v1.7.39.4 site with latest plugins and have just expanded an EN site to a bilingual EN/FR site. When I index the site using the CLI, it creates separate indeces:

bin/plugin algolia-pro index

Re-indexing Algolia Search
==========================

 69/69 [============================] 100% 24 secs/24 secs -- Index Config: crawl | Algolia Index: crawl-en-troylfs
 24/24 [============================] 100% 13 secs/13 secs -- Index Config: crawl | Algolia Index: crawl-fr-troylfs

But when I use the front-end search, both sites return FR results.

I can query language-specific results with the CLI, for example:

bin/plugin algolia-pro query sprinkler --lang="en"

How do I specific the search language in my site configuration or templates?

{% include 'partials/algolia-pro/instantsearch.html.twig' ignore missing with { index: 'crawl' } %}

I tried modifying the Twig above to specify { index: 'crawl-en-troylfs' } but that blew up the site.

The only reference to "lang" in the front or back end docs is some code in Adding custom CLI Index Options (advanced).

rhukster commented 1 year ago

See that code snippet you have there that is passing the index named crawl? you just need to change that to the name of your indexes. As you have multilanguage and the format of your indexes is crawl-[LANG]-troylfs based on your CLI output, you would need to add a variable in it so something like:

{% set index_name = 'crawl-' ~ grav.language.getActive ~ '-troylfs' %}
{% include 'partials/algolia-pro/instantsearch.html.twig' ignore missing with { index: index_name } %}
thekenshow commented 1 year ago

HI, thanks for the quick response. I've applied that change (more verbose because grav.language.getActive doesn't return the default language in a Gantry particle).

{% if grav.language.getActive == '' %}
    {% set lang = 'en' %}
{% else %}
    {% set lang ='fr' %}
{% endif %}
{% set index_name = 'crawl-' ~ lang ~ '-troylfs' %}
{% include 'partials/algolia-pro/instantsearch.html.twig' ignore missing with { index: index_name } %}

On my dev site(7.3.3) I'm getting "Undefined index: crawl-fr-troylfs" and "Undefined index: crawl-en-troylfs".

Screen Shot 2023-03-13 at 1 32 16 PM

After pushing this change to my staging site (7.4.33), which is directly online, I'm getting "TypeError Return value of Grav\Plugin\AlgoliaProPlugin::getIndexConfiguration() must be of the type array, null returned"

Screen Shot 2023-03-13 at 1 03 29 PM

rhukster commented 1 year ago

Apologies, i gave you bad advice :) I was going on an implementation I had done for a client project that used the Algolia JS API directly, not using the InstantSearch/Twig system we use in Algolia Pro by default.

The name of the index is the configuration name, not the actual Aloglia index name.

Grav comes with crawl and pages by default, it should be just the crawl you had previously. it should be able to automatically reconstruct the correct Algolia index name based on the language. Im not sure why it's not working for you as it is doing this on the https://learn.getgrav.org site without anything custom

thekenshow commented 1 year ago

Ah. I've reset back to where I was, and I'm getting consistent EN results on dev and staging. This isn't good, because I can't test the Algolia indexing because it seemed yesterday that when I indexed the FR content with the CLI, that took over search for both the main (EN) and FR sites. Those are live as of today because I assumed this was a configuration issue.

Where does the decision get made for which index to use? Is that in the Algolia Por plugin? Where do the languages to be indexed get determined?

Also, I don't see any language selection on learn.getgrav.org, and the <html> element has lang="17"?

Screen Shot 2023-03-13 at 2 37 32 PM

rhukster commented 1 year ago

You have to be in the other language to search for it.. so /fr in your URL, then it should use the french version

thekenshow commented 1 year ago

Yes, I get how the language selection works for the site. You can test the FR site at https://troylfs.com/fr, and the EN at https://troylfs.com.

I remembered the keyboard shortcut to activate the search, so I can work on the side to work without exposing it to FR visitors (search button is hidden).

Status

I've reindexed Algolia Search from the production site just now, and showed 24 records:

Screen Shot 2023-03-13 at 7 51 59 PM

But when I log into the Algolia site, the crawl-fr-troy index is empty:

Screen Shot 2023-03-13 at 7 56 50 PM

I ran a separate CLI command to index just the FR site, but it made no difference. Still no records there.

Querying the EN index with CLI returns results, same for the EN Web UI search:

Screen Shot 2023-03-13 at 8 03 36 PM

Querying the FR index with CLI returns nothing:

Screen Shot 2023-03-13 at 8 04 55 PM

How can the indexing succeed, report the number of records created, but nothing is actually done?

Screen Shot 2023-03-13 at 8 12 12 PM

rhukster commented 1 year ago

Disable smart indexing during testing and development. The clear grav cache.

thekenshow commented 1 year ago

Success! Thanks.

I have few questions:

  1. What does smart indexing do exactly? No mention of this in the backend docs.
  2. Should I leave smart_indexing: false on my dev and staging configurations?
  3. When I indexed the FR staging site, the index on Algolia was populated and the FR search started working, but it didn't work on production until I repeated the process there. What gets stored locally that's required for the search to work?

FYI: There are a couple terms missing from Algolia's language.yaml file (circled red in image below).

Screen Shot 2023-03-14 at 8 34 31 AM

rhukster commented 1 year ago
  1. Smart Index basically uses Grav's cache to track chunks that have been indexed before, and simply skips over them if a checksum matches. This is to cut down on redundant API calls.. saving you money.
  2. Usually you should have one set of Aloglia indexes that all your servers use (dev, staging, prod). If you use smart indexing that is only specific to one grav environment, unless you push the cache to your various environments (which is not a good idea). I would maintain the same setting everywhere, and perhaps only index on staging.
  3. As long as you use the same aloglia configuration, which point to the same algolia indexes (check on Algolia dashboard to see them created), you should not need to index in multiple places. I think the behavior you saw was what you thought was a full index, but was in fact skipping lots of things because of smart indexing. So basicallly because you were indexing in multiple places with smart-indexing on, caused the behavior you saw.

BTW, you will see many more records coompared to pages. Algolia has a limit on how much content can fit into a record, so we have to "Chunk" up content from a single page so that 1 page may actually consist of any number of records depending on content length.

NOTE: This behavior is quite common and actually very useful. We use it to allow us to show results from within a page, an even provide links that jump to closest headers to that content you search for.

thekenshow commented 1 year ago

Thanks for these details, very useful. I'll manage my dev/stage/prod indexes more carefully going forward.

felipedasilva commented 1 year ago

Hi, I got the same error in a project with multiple languages (in my case only in production environment). After reading the code I just found the error was about. This error is about caching the configuration without separating by active language. If see the first image below, when there is no configuration in the cache, the plugin will call the function responsible to bring the configuration and after this save it in the cache. However, if you look at second image you see that there is a line that will set the lang option. That's the bug, the first time we go to some page(clear the cache first) it will get the active lang and create the configuration and cache it, after this any page will consume the configuration from the cache independently the language. To resolve this(as a suggestion), we need pass the lang in the $options before we call those methods(see third image).

Note: The version I got the error is 3.2.0 but even updated the plugin the error wasn't fixed.

image image image

thomastweets commented 1 year ago

@rhukster, sorry to bump this one - @felipedasilva is my colleague and although this is issue is closed it indeed is a problem for us and seems like the same root cause as the one of @thekenshow. Maybe it is worth re-opening or maybe you just want to use the bugfix @felipedasilva for releasing a new version? We can confirm that this fixed our issue on production. Thanks a lot!

rhukster commented 1 year ago

I'm back from vacation now. I'll take a look at this as soon as I can.

rhukster commented 1 year ago

btw, had a quick look through your fix, and it actually looks pretty good to me.. the only minor change i'll make the logic to caclulate the language:

        if ($language->enabled()) {
            $options['lang'] = $language->getLanguage();
        }

already does the check for active and falls back to default.

rhukster commented 1 year ago

released 1.0.9. Seems 1.0.8 never got published either so might. jump from 1.0.7 -> 1.0.9

rhukster commented 1 year ago

FYI found a couple more multilang issues and fixed them in last releases. Should be better now via admin where those problems manifested.

rhukster commented 1 year ago

Found a few more multilang issues. Should be good now.