zeeshanakram3 commented 1 year ago

Problem

Currently, storage nodes don't delete assets no longer assigned to them in the runtime. And hence dead assets keep on accumulating on the nodes. This means that disk usage of the nodes would continue to grow unless operators manually clean up the space which is very risky. Other complications of not pruning assets are that DataObjects/Bags can't be effectively moved b/w different buckets/operators, as this action does not physically free up the space, so Storage Lead can't use the Globally available disk space effectively.

zeeshanakram3 commented 1 year ago

Proposal

TBD

yasiryagi commented 1 year ago

there should be a mean for the storage server operator to run a command:

that will show the diff between what is assigned to the bucket (i.e QN) and what is in the server.
the command should have modes:
- View
- Action (delete ) , should have a data loss mitigation::
- Provide a warning before execution.
- Check the availability of the object where it should if any.
- Force a replication if the object does not exist where it should be,
- Set the storage server to auto do the pruning, the auto pruning ( This mode should also be available in storage server command as an option. )should have a data loss mitigation:
- Back off period between detection (logs and metrics should be generated at this stage) and actioning the pruning.
- Check the availability of the object where it should if any.
- Force a replication if the object does not exist where it should be,

bedeho commented 1 year ago

shouldn't this happen automatically?

yasiryagi commented 1 year ago

That is the ideal with couple of two more consideration:

Data loss mitigation consideration.
Ability of the operator to force it through command.

kdembler commented 8 months ago

@zeeshanakram3 Should we close this already or there's more work to be done?

Joystream / joystream

Pruning of assets from storage nodes that they are no longer obliged to store #4813

Problem

Proposal

there should be a mean for the storage server operator to run a command: