DIRACGrid / DIRAC

DIRAC Grid
http://diracgrid.org
GNU General Public License v3.0
113 stars 174 forks source link

DFC: DirectoryUsage issues #2382

Open andresailer opened 9 years ago

andresailer commented 9 years ago

Two Issues with the DirectoryUsage for the DFC:

a) I noticed some errors in the DirectoryUsage overview in the FileCatalogCLI

FC:/> size -l
directory: /
Logical Size: 2,533,639,186,906,667 Files: 11108048 Directories: 213751
    StorageElement                        Size                   Replicas 
=========================================================================
[...]
  8 PNNL-SRM                              -2,474,098,350,160     -118993 
[...]

Notice the negative size and number of replicas. This was probably caused by me unregistering all the replicas on PNNL-SRM, and in some cases unregistering replicas no longer present at PNNL-SRM, but those were still subtracted from the usage.

b) When I tried to rebuild the directory usage this happened

FC:/> rebuild catalog
Error: Socket read timeout exceeded

At first the DirectyUsage was basically empty, but it has then filled up again. Is this directoryUsage table slowly being rebuild? I can see some discrepancies between what I can get directly from the FC_Files or FC_Replicas tables (number of replicas per SE, total size) and what the "size -l" shows.

atsareg commented 9 years ago

In your case the DirectoryUsage somehow got out of sync with the actual contents of the catalog. You correctly did the rebuilding of the table. This is a lengthy operation, so it is kind of normal that you have got a timeout ( I will increase the timeout for this command ). This recreates the DirectoryUsage tables completely. In the normal course of operations, the table is updated each time there is a file/replica added or removed from the catalog. So, it should be in perfect sync with what you can get from the database tables. If you see discrepancies, the table can be rebuilt. If you see discrepancies systematically, then there is a problem that we have to identify and address appropriately

atsareg commented 9 years ago

This is rather a topic for discussion in the DIRAC forum. Closing it here

andresailer commented 9 years ago

As I said, I am pretty sure the negative size and number of files is caused when calling "unregister replica LFN SE" for replicas no longer present at the given SE, so this is a bug that needs to be fixed.

I think the rebuild command does not work properly when there are other file additions in the catalog while the command is running.

If it is normal to get a timeout, why not print out a message to the user telling them about possible timeouts.

PS: The rebuild commands needs an argument and crashes otherwise

FC:/> rebuild
Traceback (most recent call last):
  File "/home/sailer/software/DIRAC/DiracDevV6r12/DIRAC/DataManagementSystem/scripts/dirac-dms-filecatalog-cli.py", line 57, in <module>
    cli.cmdloop()
  File "/home/sailer/software/DIRAC/DiracDevV6r12/Linux_x86_64_glibc-2.12/lib/python2.6/cmd.py", line 142, in cmdloop
    stop = self.onecmd(line)
  File "/home/sailer/software/DIRAC/DiracDevV6r12/Linux_x86_64_glibc-2.12/lib/python2.6/cmd.py", line 219, in onecmd
    return func(arg)
  File "/home/sailer/software/DIRAC/DiracDevV6r12/DIRAC/DataManagementSystem/Client/FileCatalogClientCLI.py", line 2447, in do_rebuild
    _option = argss[0]
IndexError: list index out of range
andresailer commented 8 years ago

https://groups.google.com/forum/#!topic/diracgrid-forum/bXH45l-9ofw Bump

chaen commented 6 years ago

Still the case ?

andresailer commented 6 years ago

I think so...

chaen commented 5 years ago

@atsareg ping

chaen commented 4 years ago

pong