NASA-IMPACT / COSMOS

COSMOS is a web application designed to manage collections indexed in NASA's Science Discovery Engine (SDE), facilitating precise content selection and allowing metadata modification before indexing.
https://sde-indexing-helper.nasa-impact.net/
2 stars 1 forks source link

Add Deletion of Existing Content into Workflow #353

Open CarsonDavis opened 1 year ago

CarsonDavis commented 1 year ago

Description

We often need to re-index a collection, with new mappings and exludes. It seems that Sinequa doesn’t always get rid of the old data when sources are updated and run. So we ned to figure out a new path.

Implementation Considerations

This issue is related to the Collection Deletion Wiki card I'm unsure what the best way to do this is. Do we:

### Acceptance Criteria
- [ ] figure out new method
- [ ] add to webapp process
justin-john-sinequa commented 1 year ago

Documents in collections are updated with incremental indexing according to the following rules: https://doc.sinequa.com/en.sinequa-es.v11/Content/en.sinequa-es.connector.getting-started.html#IncrementalIndexing

justin-john-sinequa commented 1 year ago

NOTE: Deleting documents from indexes doesn't actually remove them completely, it just marks them as deleted. These are called "ghosts". While they will be excluded from search and relevance calculations, they can artificially increase the size on disk of your indexes and negatively impact search performance. Ghosts can be cleaned from indexes using a command called "reorganize index". It's recommended to configure a job to automatically run this command periodically to keep your indexes clean. If you're going to be deleting large volumes of documents frequently, your ghost count will increase far faster than normal, and you may want to increase the period of the job. https://doc.sinequa.com/en.sinequa-es.v11/Content/en.sinequa-es.syntax.sql-index-management.html#REORGANIZE https://doc.sinequa.com/en.sinequa-es.v11/Content/en.sinequa-es.managingSolution.bestPractices.html#IndexReorganizationCommand