BlueBrain / nexus

Blue Brain Nexus - A knowledge graph for data-driven science
https://bluebrainnexus.io/
Apache License 2.0
276 stars 74 forks source link

Removal of inactive projects #2524

Closed bogdanromanx closed 3 years ago

bogdanromanx commented 3 years ago

For deployments that are meant to showcase the Nexus features, like for example one that supports an EDX course, a lot of projects are created for test purposes. Having the ability to archive and remove inactive projects will free resources without the need for human intervention.

The system should allow the control of an administrator to delete existing projects through:

Expected behavior:

Other mentions:

Not in scope:

imsdu commented 3 years ago

if the main idea is to eliminate free up resources for the env, could we imagine having disposable instances ? We could have N instances of Delta that would point to n namespaces (so same instance of Cassandra/ES/BG but different keyspaces/prefixes)

For example: A user who starts their course on a given month will land a given instance for the duration of its course A user who starts the course on a different month will land on a different instance When a given instance hits its time-to-live, we just back it up / trash it / wipe it / recycle it. We would need some kind of table that would link for example the hash of the username to an instance. Entries in this table would have the same time to live so when a user is not in the table, he will hit the newest instance (so he can start again).

As we have nginx, we could use a lua script to do this ? The lua module in nginx can be used for this type of stuff: https://github.com/openresty/lua-nginx-module#typical-uses lua is not difficult and what we want to do here would be just extract the username from the jwt and handling a table

I hope I am clear and it does not sound so crazy :)

imsdu commented 3 years ago

If we go the archiving way, we must not forget to kill persistence actors related to the project

bogdanromanx commented 3 years ago

In the first paragraph, it is written that it will not need human intervention but it appears that it does as it needs actions in fusion

human intervention == manually removing the data from the database + indices

The archives will be of some use at some point ? If they are "test projects" done though the EDX course, they don't have much value no ?

Discussion lead to the proposal to Delete Projects completely instead of archiving, because it's simpler to implement (archiving = deletion + saving data into an archive); additionally we can rely on backups if we'd like to perform a restoration. @samuel-kerrien, would you be comfortable with this?

samuel-kerrien commented 3 years ago

Yep comfortable with this. Seems to me like the archiving is not strictly necessary for the MOOC so I would support doing first deletion and schedule the archiving at a later time when it is really needed.

umbreak commented 3 years ago

The following are opened questions regarding removal of projects:

umbreak commented 3 years ago
  1. Create a project event DeleteProject. Any subsequent writes on resources will be blocked.

  2. Deprecate all the views (this will delete its indices/namespaces).

  3. Delete all files

    • Delete the DiskStorage project folder. Postpone the implementation for RemoteDiskStorage and S3Storage.
    • Go through each file (eventsByTag) and delete it.
  4. Delete the caches for views, storages, schemas, resolvers, projects.

  5. Delete the project entry from the ProjectsCounts cache

  6. currentEventsByTag(project) and for each persistenceId (this can be done in batch and in parallel):

    1. Stop the persistence actor.
    2. Call cleanup (delete the events from the primary store).