BlueBrain / nexus

Blue Brain Nexus - A knowledge graph for data-driven science
https://bluebrainnexus.io/
Apache License 2.0
276 stars 74 forks source link

Design for plugin-based Elasticsearch Transformations #2703

Closed samuel-kerrien closed 2 years ago

samuel-kerrien commented 3 years ago

Follow up of this ticket: Transformation before ElasticSearch indexing #2549

Acceptance Criteria

Next

imsdu commented 2 years ago

Draft PR that gives some insights of what should be done here: https://github.com/BlueBrain/nexus/pull/2886

NB: We would rather use the term Pipe instead of Transformation

Property of pipes:

Internal changes in Delta

Changes in the ES views endpoint

Creates/updates

A new property will appear in the payload when creating/updating

Example:

{
  ...,
  "pipeline": {
    {
     "name": "excludeMetadata"
    },
    {
      "name": "filterTypes",
      "description": "We wish only to keep persons here",
      "context": {
        "types": ["schema:Person"]
      }
    },
    {
      "name": "[pipeName]",
      "description": "[Optional description]",
      "context": {
        "property": "[Optional context]"
      }
    }
  },
  ...
}

Reads

For retro-compatibility purposes when getting a view:

Documentation

Checking on views

After a new Delta version / new version of a Pipe / removal of a Pipe, a pipeline can become invalid. So indexing should not be started for them and these views should be advertised (through an endpoint ?) so that users can be alerted and can fix them.

Testing a pipeline

Create an endpoint where people can test the pipeline against a bunch of resources with actually creating it / indexing

Extend Pipes to other types of views

Generalize Pipes to Blazegraph and Composite Views

Plan to redesign views endpoints

Break retro-compatibility to make endpoints more understandable/maintainable both on user and developer point of view

Tasks

1.7:

  1. Update Elasticsearch view model
  2. Migrate Elasticsearch views to new model (blocked by 1.)
  3. Update Elasticsearch indexation to integrate the pipeline (blocked by 1.)
  4. Update and keep Elasticsearch endpoints retro-compatible to handle pipelines (blocked by 1.)
  5. Documentation

After 1.7:

samuel-kerrien commented 2 years ago

I see that you are considering adding an endpoint to discover broken views, would it not be simpler to describe the status of a view when listing them (ref: https://bluebrainnexus.io/docs/delta/api/views/index.html#list-views) ?

samuel-kerrien commented 2 years ago

Are to able to fit @kaij's usecase into this proposal ?

imsdu commented 2 years ago

I was thinking of another endpoint which would act as an healthcheck with the failing views and the reason (retro-compatibility has been broken, the pipe does not exist anymore, ...).

An endpoint that could be used by a probe that could trigger an alert if at least a view is failing.

I am sharing it with the team for now to get some validation between us first. For example, it is very Delta-focused for now, there should be some impacts on Fusion to discuss too.