To allow setting a list of hosts separated by comma.
Modify elasticSearchFunctions to a class:
To avoid using global variables and passing the client around. The class could be initiated in wsgi.py like it's being setup now.
Improve mapping declaration:
We have dynamic indexes and a wild mapping in the AIP and AIP files indexes, with multiple elements from the METS file. Most of those fields are never used and probably shouldn't be indexed. Also, having strict indexes gives more control over what can be found in the documents.
Normalize field names to snake_case in all indexes:
Field names are quite a mess. I didn't want to change this as external tools may be using the indexes but, since the indexes have changed anyway to be single document indexes, it may be a good moment to do so.
Remove ES dependency in dashboard:
Only used to wrap some ES errors, but I think we could find a way to do it via archivematicaCommon.elasticSearchFunctions or fully remove it.
Avoid index check and creation on the fly:
We could force to run the rebuild indexes commands on the install/upgrade process to avoid checking and creating the indexes on the fly. The check is a HEAD request, so it's not a huge deal, but I think we should not do it.
Remove cluster health and index retry systems:
Trust timeout setting and let if fail.
Use fixed ids in documents:
Use the UUID of each resource as the ES document _id to avoid making queries to get the document ids for deletes and updates.
Improve backlog and archival storage searches:
Multiple requests are made on the views. This will require several changes and may not be doable in the time remaining but we should try to not do an extra count request (required by the pager system) and a query over the files indexes while querying later over the aips and transfers index to show the packages instead fo the files but make the query over the files.
Improve boolean query generation:
To properly represent the user input in the search form. Explained in here.
Allow to configure the index names:
This would allow to create an ES cluster that can be used by more than one Archivematica instance.
Describe the solution you'd like to see implemented.
Some of them are included above but this is open to discussion.
Describe alternatives you've considered.
Additional context
For Artefactual use:
Please make sure these steps are taken before moving this issue from Review to Verified in Waffle:
All PRs related to this issue are properly linked 👍
All PRs related to this issue have been merged 👍
Test plan for this issue has been implemented and passed 👍
Documentation regarding this issue has been written and it has been added to the release notes, if needed 👍
Please describe the problem you'd like to be solved.
From the Elasticsearch support upgrade, a few enhancements came out that could not be addressed in that period:
To allow setting a list of hosts separated by comma.
elasticSearchFunctions
to a class:To avoid using global variables and passing the client around. The class could be initiated in
wsgi.py
like it's being setup now.We have dynamic indexes and a wild mapping in the AIP and AIP files indexes, with multiple elements from the METS file. Most of those fields are never used and probably shouldn't be indexed. Also, having strict indexes gives more control over what can be found in the documents.
snake_case
in all indexes:Field names are quite a mess. I didn't want to change this as external tools may be using the indexes but, since the indexes have changed anyway to be single document indexes, it may be a good moment to do so.
Only used to wrap some ES errors, but I think we could find a way to do it via
archivematicaCommon.elasticSearchFunctions
or fully remove it.We could force to run the rebuild indexes commands on the install/upgrade process to avoid checking and creating the indexes on the fly. The check is a HEAD request, so it's not a huge deal, but I think we should not do it.
Trust timeout setting and let if fail.
Use the UUID of each resource as the ES document
_id
to avoid making queries to get the document ids for deletes and updates.Multiple requests are made on the views. This will require several changes and may not be doable in the time remaining but we should try to not do an extra count request (required by the pager system) and a query over the files indexes while querying later over the
aips
andtransfers
index to show the packages instead fo the files but make the query over the files.boolean
query generation:To properly represent the user input in the search form. Explained in here.
This would allow to create an ES cluster that can be used by more than one Archivematica instance.
Describe the solution you'd like to see implemented.
Some of them are included above but this is open to discussion.
Describe alternatives you've considered.
Additional context
For Artefactual use: Please make sure these steps are taken before moving this issue from Review to Verified in Waffle: