dkirkby / bossdata

Tools for accessing SDSS BOSS data
MIT License
1 stars 3 forks source link

Tool to monitor and manage local disk space #55

Open dkirkby opened 9 years ago

dkirkby commented 9 years ago

This issue is to discuss and then implement a new script (called bosslocal ?) that will scan your $DATA_LOCAL_ROOT and report things like:

Ideally, there would be options to only include files that have (or have not) been accessed in X days, etc, similar to the unix find command.

The script should also have options to do some cleanup of the largest files that have not been accessed recently, etc.

dcunning11235 commented 9 years ago

I think https://github.com/dkirkby/bossdata/issues/52 has bearing on this. Reversing part of my comment there, I'm thinking e.g. bosslocal is expanded to a general e.g. bossmgr util that allows for management of the local 'raw' files, the DB's (addition, removal, listing of indexes), archiving/deletion, etc. from one cmd line util.

dkirkby commented 9 years ago

I agree that the new command-line tool should know about the raw files that backup each sqlite db and be able to manage them intelligently. For example, a --prune option might delete the raw file as long as the db file is present. Perhaps bosslocalmgr for the name?

dcunning11235 commented 9 years ago

Waiting on my virtual machine drive to backup so I can resize it is giving me ample free time; here is a rough rundown on functionality I can think of, part plan, part question, part wishlist:

dcunning11235 commented 9 years ago

Features as actually implemented:

Next Iteration

Thrown out

dkirkby commented 9 years ago

A few quick comments based on your notes (but I haven't looked at any code yet):

dcunning11235 commented 9 years ago
dcunning11235 commented 9 years ago

Creation date is a filesystem dependent feature; which makes sense, but isn't something I'd considered, always assuming it to be there.

In any case: on the ext4 filesystem this is available, but not on ext2 or ext3; for these cases, modification date is the closest we can get (date from server is not preserved, at least as far as HTTP downloads go... not sure about globus.) FAT?? and NTFS support this, as well. In any case, the short answer is that -mtime and -mmin options are going to be needed.

Points a file has its date set:

event ctime* mtime atime Note
Download 1 1 1
Access 0 0 1 Only with latest db_manage branch
Index Update 0 1 1 If we keep this functionality

ctime*: If supported, obviously

The table makes a good argument for just throwing -ctime and -cmin and replacing with -mtime etc. None of this takes into account how OSX works however.