discoproject / disco

a Map/Reduce framework for distributed computing
http://discoproject.org
BSD 3-Clause "New" or "Revised" License
1.63k stars 241 forks source link

Feature: blob size in DDFS #408

Open tspurway opened 10 years ago

tspurway commented 10 years ago

It would be useful to be able to track how much space blobs occupy in DDFS. We could modify some of the ddfs subcommand to report and/or aggregate on this data to help diagnose cluster free space issues.

oldmantaiter commented 9 years ago

Was thinking about this today, it would be interesting to add something in the extended attributes of the tag for the following metrics:

Then we could issue something along the lines of a ddfs stat for the tags and get similar output to stat on the filesystem.

This would add some overhead for each tagging operation, but we could also do this as a daily type internal job that could scrape every tag/blob and get the size from the filesystem. Depending on the amount of blobs and tags this might take > 24hrs though and would not be real time.

oldmantaiter commented 9 years ago

We could also add something like retention, where a "janitor" type job could iterate over the tags and see if that tag is set to expire, that way it could automatically clean them up for the next GC. Currently we use a script for this that is not very intuitive as we have to add tags to them if new types of data are pushed to the cluster.