DIRACGrid / DIRAC

DIRAC Grid
http://diracgrid.org
GNU General Public License v3.0
112 stars 174 forks source link

remove replicas not registered in FC #261

Closed elisal closed 9 years ago

elisal commented 12 years ago

Hi, I modified the dirac-dms-remove-lfn-replica script, adding an option: FCCheck. By def. it is YES, so it checks the replica's existence in the LFC, and behaves exactly like the current version. If the option is set to NOLFC, then it does: -checks in any case if the replica is registered, if yes it doesn't remove anything -if the replica is NOT registered, then it calls: StorageElement.getPfnForLfn() to get the surl, and then it calls ReplicaManager.removeStorageFile( surl, seName )

Examples: [lxplus423] > srmls srm://srm-lhcb.gridpp.rl.ac.uk/castor/ads.rl.ac.uk/prod/lhcb/test/roberto/temp/SARA_5.13778 5908 /castor/ads.rl.ac.uk/prod/lhcb/test/roberto/temp/SARA_5.13778 this file exists on storage and it is not registered in LFC:

dirac-dms-lfn-replicas /lhcb/test/roberto/temp/SARA_5.13778 {'Failed': {'/lhcb/test/roberto/temp/SARA_5.13778': 'No such file or directory'}, 'Successful': {}}

then the script with the NOLFC option will remove it: [hpdesk] > dirac-dms-remove-lfn-replica /lhcb/test/roberto/temp/SARA_5.13778 RAL-DST NOLFC WARNING: removing physical replica from storage, without removing entry in the FC ReplicaManager.executeReplicaStorageElementOperation: Failed to get replicas for file. /lhcb/test/roberto/temp/SARA_5.13778 No such file or directory ReplicaManager._executeStorageElementFunction: No pfns supplied. ReplicaManager.executeReplicaStorageElementOperation: Failed to execute isFile StorageElement operation. ReplicaManager._executeStorageElementFunction: No pfns supplied. Summary: Successfully removed: ['srm://srm-lhcb.gridpp.rl.ac.uk/castor/ads.rl.ac.uk/prod/lhcb/test/roberto/temp/SARA_5.13778'] Failed to remove: []

and in fact the file has been removed:

srmls srm://srm-lhcb.gridpp.rl.ac.uk/castor/ads.rl.ac.uk/prod/lhcb/test/roberto/temp/SARA_5.13778 Fri Nov 04 10:20:04 CET 2011: Return status:

  • Status code: SRM_FAILURE
  • Explanation: Failed for all paths SRM_INVALID_PATH File/directory 0 /castor/ads.rl.ac.uk/prod/lhcb/test/roberto/temp/SARA_5.13778 does not exist.

on the other side, if I execute the script with the NOLFC option for a replica that IS REGISTERED in LFC, the script will refuse to remove it. E.g.

dirac-dms-lfn-replicas /lhcb/user/l/lanciott/apiEx.py {'Failed': {}, 'Successful': {'/lhcb/user/l/lanciott/apiEx.py': {'CERN-USER': 'srm://srm-lhcb.cern.ch/castor/cern.ch/grid/lhcb/user/l/lanciott/apiEx.py', 'SARA-USER': 'srm://srm.grid.sara.nl/pnfs/grid.sara.nl/data/lhcb/user/l/lanciott/apiEx.py'}}}

I try to remove it with the NOLFC option but the script says no, it can't be removed only from storage: [hpdesk] /home/elisal/dev > dirac-dms-remove-lfn-replica /lhcb/user/l/lanciott/apiEx.py CERN-USER NOLFC WARNING: removing physical replica from storage, without removing entry in the FC WARNING: file is registered in FC! it will NOT be removed from storage! {'OK': True, 'Value': {'Successful': {'/lhcb/user/l/lanciott/apiEx.py': True}, 'Failed': {}}} Summary: Successfully removed: [] Failed to remove: []

and in fact the replica of CERN-USER is still there :

dirac-dms-lfn-replicas /lhcb/user/l/lanciott/apiEx.py {'Failed': {}, 'Successful': {'/lhcb/user/l/lanciott/apiEx.py': {'CERN-USER': 'srm://srm-lhcb.cern.ch/castor/cern.ch/grid/lhcb/user/l/lanciott/apiEx.py', 'SARA-USER': 'srm://srm.grid.sara.nl/pnfs/grid.sara.nl/data/lhcb/user/l/lanciott/apiEx.py'}}}

I think the functionality is fine. I will do some more tests and then if there are no objections I will issue a pull request. code is here: https://github.com/elisal/DIRAC/blob/removeUnregisteredRepl/Interfaces/scripts/dirac-dms-remove-lfn-replica.py cheers

graciani commented 12 years ago

Hi Elisa,

NoLFC should not be and argument but an switch.

Ricardo

elisal commented 12 years ago

Hi, you are right. Now it's a switch. https://github.com/elisal/DIRAC/blob/removeUnregisteredRepl/Interfaces/scripts/dirac-dms-remove-lfn-replica.py cheers

graciani commented 12 years ago

Hi Elisa,

I think that in both cases (with or without the switch) you should first try to remove using the RM.removeReplica method (or the dirac API equivalent). If the NoLFC option is set, and the removal has failed with 'No such file or directory', then you have to use the RM.getPfnForLfn and then RM.removeStorageFile.

The logic you have implemented allows to remove a replica from the Storage leaving the Replica info in the LFC. This is clearly not what we want.

Have we agreed to keep the Dirac or the ReplicaManager based dms scripts?

graciani commented 12 years ago

Sorry, was too fast:

elisal commented 12 years ago

ok, if you prefer I will change the script calling first ReplicaManager.removeReplica() , and then consider the switch. However, with the current logic no inconsistency can be generated, with or without the NoLFC switch. BTW, another possibility is to merge the 2 DMS 'removal' scripts (dirac-dms-remove-lfn-replica and dirac-dms-remove-replicas) as said in a mail thread some time ago. The idea was to enhance the Dirac API to be able to remove replicas from multiple SEs (and now also for removing a replica from storage, to cover also the use case of removal of replicas not registered), and then get rid the script directly based on ReplicaManager.

graciani commented 12 years ago

The logic in your code, with the "NoLFC" will remove the replica from the Storage without checking/removing anything from the LFC. So if the Replica is properly register in LFC, it file will be removed from the storage, but not from LFC. The inconsistency is only avoid if you first run the script without the switch and then again with the switch (but this can not be our working assumption).

We need to be consistent. If for the moment you want not to touch the Dirac API, the new script has to start from the one using the ReplicaManager and add the extra code needed. If you want to start from the script based on the Dirac API, then we should add a new method to the API that does what we have describe, try to remove first as if the replica is properly registered and then go directly to the Storage and remove again.

elisal commented 12 years ago

Hi Ricardo, please see the example below (the same I posted already before)

[hpdesk] > dirac-dms-lfn-replicas /lhcb/user/l/lanciott/apiEx.py {'Failed': {}, 'Successful': {'/lhcb/user/l/lanciott/apiEx.py': {'CERN-USER': 'srm://srm-lhcb.cern.ch/castor/cern.ch/grid/lhcb/user/l/lanciott/apiEx.py', 'RAL-USER': 'srm://srm-lhcb.gridpp.rl.ac.uk/castor/ads.rl.ac.uk/prod/lhcb/user/l/lanciott/apiEx.py'}}} [hpdesk] > dirac-dms-remove-lfn-replica /lhcb/user/l/lanciott/apiEx.py CERN-USER --NoLFC WARNING: removing physical replica from storage, without removing entry in the FC WARNING: file is registered in FC! it will NOT be removed from storage! {'OK': True, 'Value': {'Successful': {'/lhcb/user/l/lanciott/apiEx.py': True}, 'Failed': {}}} [hpdesk] > dirac-dms-lfn-replicas /lhcb/user/l/lanciott/apiEx.py {'Failed': {}, 'Successful': {'/lhcb/user/l/lanciott/apiEx.py': {'CERN-USER': 'srm://srm-lhcb.cern.ch/castor/cern.ch/grid/lhcb/user/l/lanciott/apiEx.py', 'RAL-USER': 'srm://srm-lhcb.gridpp.rl.ac.uk/castor/ads.rl.ac.uk/prod/lhcb/user/l/lanciott/apiEx.py'}}}

the script checks if the replica is registered , and in case it is, it does not remove the file Why do you say that it removes the replica without checking/removing anything from LFC?

graciani commented 12 years ago

Sorry, I had missed: res = rm.getReplicaIsFile( lfn, seName ) if res['OK']: print 'WARNING: file is registered in FC! it will NOT be removed from storage! ', res continue

you are right.

Still, this means that if you have a bunch of files that are "problematic" you have to run the command twice to solve the situation. I think that the option "DoNotTrustFC" should allow to remove the Replica(s) with a single command no matter if they are or they are not registered in the FC (not LFC).

graciani commented 12 years ago

might be we need another flag to remove a replica from Storage only if not registered in the FC.

fstagni commented 9 years ago

Chris, I assign this to you momentarily. Please close it if you think it's done.

chaen commented 9 years ago

All this (and more!) is already implemented in the DMScript of LHCbDIRAC. There has been plenty of discussions about porting at least part of it into DIRAC if I understood correctly. Anyway, most of what is said here is (or will be very soon) obsolete, so I think it can be closed, but I don't have the karma for it.

fstagni commented 9 years ago

Closing it, moving to DIRAC can be a good idea.