dmwm / PHEDEX

CMS data-placement suite
8 stars 18 forks source link

Improvement requests for FileDeleteTMDB #596

Open ericvaandering opened 10 years ago

ericvaandering commented 10 years ago

Original Savannah ticket 59732 reported by None on Fri Nov 27 11:54:32 2009.

Hello,

following a recent incident at T1_FR_CCIN2P3, where I had to delete hundreds of thousands of file replicas at the site and invalidate the tens of thousands of files that had no other replicas, I propose the following improvements for FileDeleteTMDB to make life easier for the operators:

1) -listonly option. Print out a preview of the list of files/blocks/datasets that would be replica-deleted/invalidated by the command, but do not actually execute the deletion.

2) Support for bulk deletion also for block replicas at a node, in case the target block replica is inactive. Note however that currently the block replicas at other sites are not reactivated for retransfer when doing a bulk deletion of inactive block replicas:

https://savannah.cern.ch/bugs/?59403

3) Improved support for wildcards in replica deletion of LFNs. Currently the script works like this when an LFN with wilcards is supplied: a) The full list of LFNs matching that wildcard is retrieved from TMDB b) Replica deletion is attempted for every entry in the list for the target nodes, printing out a warning if the node does not have that replica. This results in monstrously slow script execution - for example, at T1_FR_CCIN2P3 the entire /store/mc/Summer08 directory tree was lost among others. This namespace contains 1M files in PhEDEx, but only 25k were actually at T1_FR_CCIN2P3. Going through the list of 1M LFNs took about 8 hours, so for other namespaces I took the explicit list of LFNs from TMDB, making the wildcard option useless.

4) Enable (not by default, with a command line option) automatic global invalidation of the LFNs/blocks/datasets deleted, in case the replica that was deleted was the only one left in TMDB. Of course, printing out the explicit list of what was invalidated globally and what was just replica-deleted. Currently, when the last replica of a file is deleted with FileDeleteTMDB -node, but the operator forgets to invalidate the file with FileDeleteTMDB -invalidate, the transfer requests for this file stay forever, filling the request queues with 'dark data' that can never be transferred.

Thanks! Nicolo'

ericvaandering commented 10 years ago

Comment by egeland on Fri Feb 26 10:40:33 2010

Adding dependency on #59403 for fast bulk deletion of inactive blocks.