Open guilhermeeric opened 1 year ago
Are you deleting the page in the grav admin ? Does "reset index" (red button) in the admin delete the entry as expected?
Please try with the latest versions.. some improvements have been made that might impact this.
I'm having a similar issue, although my site setup includes Gantry so I'm indexing via CLI with:
bin/plugin algolia-pro index --url="https://mysite.com/sitemap.json"
What I want is to run a weekly indexing that ensures there are no pages staying in the index after they're unpublished on mysite.com.
I have Smart Indexing disabled, IIRC again because of Gantry. My understanding is that Smart Indexing only optimizes the number of API calls, so disabling it could raise costs but would not cause unpublished results to linger in the index. Is that correct?
Example My mysite.com site was returning a result for the string "5a241608b220bf0afc28e8a1ce0907b6", which was from a Flex object URL that was unpublished.
I ran bin/plugin algolia-pro index --url="https://mysite.com/sitemap.json"
, but the "5a241608b220bf0afc28e8a1ce0907b6" result persisted.
I logged into Algolia and searched the index directly, confirming that "5a241608b220bf0afc28e8a1ce0907b6" was still indexed.
I tried the following:
bin/plugin algolia-pro index --flush --url="https://mysite.com/sitemap.json"
But there was no change in the indexed result.
Finally, I logged into Algolia, cleared the index, and then ran:
bin/plugin algolia-pro index --url="https://mysite.com/sitemap.json"
The indexing appaered to complete successfully but it was not sent to Algolia. I tried two more times, still nothing showed in Algolia.
Finally, I included --flush
and the index was restored on Algolia.
Two questions about --flush
:
The main question is still: How do I use the CLI to generate a fresh index once a week with no outdated pages in the index?
in regards to smart indexing...
aloglia-pro keeps track of the 'chunks' of pages that are indexed. Every page is chopped up into chunks because algolia has a strict limit on the size of any item it can index. if the content is very small on a particular page/url, it might only take one chunk, but typically for regular sized articles it will take several chunks. the plugin keeps track of a hash of the chunks, so if it thinks a particular chunk is already indexed becaues it matches the existing hash exactly, it won't send that chunk to be replace the current one. That's it really.
Now deleting pages is another issue.. when you remove a page in the admin, algolia-pro knows its a delete and sends a call to algolia to remove all indexed chunks of that page. If you don't use admin and simply delete a page, algolia has no clue, and continues to assume the page does exist. Even if you reindex, it won't remove that page, because it only adds to the index. The way to handle that is to "flush" the index with the -f
option:
➜ bin/plugin algolia-pro help index
Description:
Algolia Pro Indexer
Usage:
index [options]
Options:
-f, --flush optionally flush the existing search indexes rather than updating
-r, --raw Raw unformatted results
-q, --quiet Do not output any message
-u, --url=URL Optional URL of JSON sitemap (CrawlPageSearch only)
--route=ROUTE Optional route of a single specific page to index (GravPageSearch only)
-x, --indexes=INDEXES Optional comma-separated list of enabled index configurations to use
-h, --help Display this help message
-V, --version Display this application version
--ansi Force ANSI output
--no-ansi Disable ANSI output
-n, --no-interaction Do not ask any interactive question
--env[=ENV] Use environment configuration (defaults to localhost)
--lang[=LANG] Language to be used (defaults to en)
-v|vv|vvv, --verbose Increase the verbosity of messages: 1 for normal output, 2 for more verbose output and 3 for debug
Help:
The index command re-indexes the Algolia search engine
So adding -f
should flush/remove the existing index before re-indexing. This will ensure a 'fresh' copy.
One thing to mention, is that sometimes things take a little while to show up in Algolia. I think this is what you are seeing. You are indexing and thinking it's not there, but it has to process everything. There's somehwere in the Algolia dashboard that shows the state of the indexing.
At first I thought perhaps you had production_mode: false
set because that will not send anything to Algolia, does all the processing locally on Grav only. But as it did show up, i'm sure it was related to the delay.
Thanks. So my understanding is:
bin/plugin algolia-pro index --url="https://mysite.com/sitemap.json"
command will add new pages to Algolia, but will not remove unpublished or deleted pagesbin/plugin algolia-pro index --flush --url="https://mysite.com/sitemap.json"
command will clear the Algolia index and then add all pages in sitemap.json to Algolia.bin/plugin algolia-pro index --flush --url="https://mysite.com/sitemap.json"
. Is that correct?
yes there's a delay after any indexing. The data is on the algolia side, but it takes some time to actually show up there in their systems. Mainly because its a highly distributed search engine, and it has to trickle through to all their nodes.
When I delete a record, algolia indexing doesn't take that in account and stills lists the deleted record when I search for it. Clicking "reindex now" or "reset index" doesn't seem to do anything apparently. The only way to get rid of deleted records appearing on the search right now is to clear index directly on algolia UI. Is there something I am missing?
Currently on:
Grav v1.7.39.4 Admin v1.10.39 Algolia v1.0.8
Steps to reproduce:
1 - Create a new document 2 - See that the new document appears on search 3 - Delete the created document 4 - Reindex Algolia 5 - See that the deleted document still appears on search results