This package aims to index all fields the portal_catalog indexes and allows you to delete the Title
, Description
and SearchableText
indexes which can provide significant improvement to performance and RAM usage.
Then, ElasticSearch queries are ONLY used when Title, Description and SearchableText text are in the query. Otherwise, the plone's default catalog will be used. This is because Plone's default catalog is faster on normal queries than using ElasticSearch.
For a comprehensive documentation about the different options of installing Elastic Search, please read their documentation.
A quick start, using Docker would be:
docker run \
-e "discovery.type=single-node" \
-e "cluster.name=docker-cluster" \
-e "ES_JAVA_OPTS=-Xms512m -Xmx512m" \
-p 9200:9200 \
elasticsearch:7.7.0
Run, on your shell:
curl http://localhost:9200/
And you should see the Hudsucker Proxy reference? "You Know, for Search"
First, add collective.elasticsearch
to your package dependencies, or install it with pip
(the same one used by your Plone installation):
pip install collective.elasticsearch
Restart Plone, and go to the Control Panel
, click in Add-ons
, and select Elastic Search
.
Now, go to Add-on Configuration
and:
You now have a insanely scalable modern search engine. Now live the life of the Mind!
docker-compose -f docker-compose.dev.yaml up -d
Your Plone site should be up and running: http://localhost:8080/Plone
Add-on Configuration
Having a queue, which does heavy and time consuming jobs asynchronous improves the responsiveness of the website and lowers the risk of having database conflicts. This implementation aims to have an almost zero impact in terms of performance for any given plone installation or given installation using collective.elasticsearch already
Workflow:
There are two queues. One for normal indexing jobs and one for the heavy lifting to index binaries. Jobs from the second queue only gets pulled if the normal indexing queue is empty.
Trade of: Instead of a fully indexed document in elasticsearch we have pretty fast at least one there.
There are a couple things that need to be done manually if you want redis queue support.
Install redis extra from collective.elasticsearch
pip install collective.elasticsearch[redis]
Install ingest-attachment plugin for elasticsearch - by default the elasticsearch image does not have any plugins installed.
docker exec CONTAINER_NAME /bin/sh -c "bin/elasticsearch-plugin install ingest-attachment -b"; \
docker restart CONTAINER_NAME
The container needs to be restarted, otherwise the plugin is not available
export PLONE_REDIS_DSN=redis://localhost:6379/0
export PLONE_BACKEND=http://localhost:8080/Plone
export PLONE_USERNAME=admin
export PLONE_PASSWORD=admin
This is a example configuration for local development only.
You can use the start-redis-support
command to spin up a plone instance with the environment variables already set
make start-redis-support
Start your own or use the start-redis
command
make redis
The redis worker does the "job" and indexes everything via two queues:
The priority is handled by the python-rq worker.
The rq worker needs to be started with the same environment variables present as described in 3.
./bin/rq worker normal low --with-scheduler
--with-scheduler
is needed in order to retry failed jobs after a certain time period.
Or yous the worker
command
make worker
If you hit convert in the control panel and you meet all the requirements to index blobs as well, collective.elasticsearch installs a default pipeline for the plone-index. This Pipeline coverts the binary data to text (if possible) and extends the searchableText index with the extracted data The setup uses multiple nested processors in order to extract all binary data from all fields (blob fields).
The binary data is not stored in index permanently. As last step the pipeline removes the binary itself.
The ingest-attachment plugin is used to extract text data with tika from any binary.
Putting all the jobs into a queue is much faster then actually calculate all index values and send them to elasticsearch. This feature aims to have a minimal impact in terms of responsiveness of the plone site.
Support for all index column types is done EXCEPT for the DateRecurringIndex index column type. If you are doing a full text search along with a query that contains a DateRecurringIndex column, it will not work.
If you want to make use of the Elasticsearch highlight feature you can enable it in the control panel.
When enabled, it will replace the description of search results with the highlighted fragments from elastic search.
This is the number of characters to show in the description. Fragments will be added until this threshold is met.
Highlighted terms can be wrapped in html which can be used to enhance the results further, such as by adding a custom background color. Note that the default Plone search results will not render html so to use this feature you will need to create a custom saearch result view.
Create the virtual enviroment and install all dependencies:
make build
Start Plone in foreground:
make start
make tests
make format
make lint
The project is licensed under the GPLv2.