Currently, storage nodes don't delete assets no longer assigned to them in the runtime. And hence dead assets keep on accumulating on the nodes. This means that disk usage of the nodes would continue to grow unless operators manually clean up the space which is very risky.
Other complications of not pruning assets are that DataObjects/Bags can't be effectively moved b/w different buckets/operators, as this action does not physically free up the space, so Storage Lead can't use the Globally available disk space effectively.
there should be a mean for the storage server operator to run a command:
that will show the diff between what is assigned to the bucket (i.e QN) and what is in the server.
the command should have modes:
View
Action (delete ) , should have a data loss mitigation::
Provide a warning before execution.
Check the availability of the object where it should if any.
Force a replication if the object does not exist where it should be,
Set the storage server to auto do the pruning, the auto pruning ( This mode should also be available in storage server command as an option. )should have a data loss mitigation:
Back off period between detection (logs and metrics should be generated at this stage) and actioning the pruning.
Check the availability of the object where it should if any.
Force a replication if the object does not exist where it should be,
Problem
Currently, storage nodes don't delete assets no longer assigned to them in the runtime. And hence dead assets keep on accumulating on the nodes. This means that disk usage of the nodes would continue to grow unless operators manually clean up the space which is very risky. Other complications of not pruning assets are that DataObjects/Bags can't be effectively moved b/w different buckets/operators, as this action does not physically free up the space, so Storage Lead can't use the Globally available disk space effectively.