DIRACGrid / DIRAC

DIRAC Grid
http://diracgrid.org
GNU General Public License v3.0
113 stars 176 forks source link

Creating datasets breaks File Catalog listing #3317

Open brentvan opened 7 years ago

brentvan commented 7 years ago

We are having an issue where the creation of datasets seems to break our ability to see the contents of our file catalog. The datasets we create with dataset add in the file catalog CLI. We are using DIRAC client version v6r17p11.

You can see what happens in the file catalog CLI in the exchange below. One of our users, alessandro, has created a test dataset named after himself. You can see that with dataset show -l. You can then see what happens when I try to ls in the home directory of the catalog. I notice that the File Catalog Application on our web interface also shows only the / directory and nothing below it any time we have any datasets defined. To be clear, the rest of the system behaves as normal and the files are clearly in the catalog where we think they are, it seems just the displaying of the catalog contents is broken.

FC:/> dataset show -l

alessandro:

Key              Value                            

====================================================== 1 Status Dynamic
2 DatasetHash 33EA4DC497564BDE7182A872C48A5C0B 3 TotalSize 92883429473 4 MetaQuery {'run_id': 1035}
5 NumberOfFiles 1061 6 DirID 1 7 OwnerGroup project8_voms_user
8 Owner banducci
9 Mode 509 10 ModificationDate 2017-04-18 19:21:59
11 CreationDate 2017-04-18 19:21:59
12 DatasetID 12

FC:/> ls Error: Server error while serving listDirectory: 'alessandro'

andresailer commented 7 years ago

Can you check the logs of the FileCatalog service if there are any warnings/errors/exceptions.

brentvan commented 7 years ago

yes - sorry for the delay I had to ask my administrators how to find the logs. Here are what I think are the relevant lines from the log. This is two different users experimenting with creating datasets and seeing that the existence of datasets breaks the CLI ls command until the datasets are removed.

CLI_logs_dataset_ls_error.txt

guiguem commented 7 years ago

@atsareg @andresailer A question we have is why setting a lfn to each dataset.

According to https://github.com/DIRACGrid/DIRAC/wiki/RFC-%232:-Replication-Service the datasets are supposed to be objects that are associated with a meta query. Having them as lfn allows them, when placed into a folder of the FC, to inherits from this folder metadata, but this seems to be the only reason. Since the dataset objects point via metaquery to the files and we could reconstruct the informations of a dataset from the files metadata, this seems to be not useful. Is there a case where we absolutely need the lfn feature? @andresailer said he have never used it, did someone else use this feature? Also as pointed in https://github.com/DIRACGrid/DIRAC/issues/3324 the ls command breaks when a dataset is located into a folder.

A possible new scheme could be to remove the lfn feature of the dataset (as it breaks the ls), expend the Annotations field/table of the dataset DB to any type of fields (by adding more tables to the DB) and have a dedicated web interface. This would remove the access of datasets from the FC (which seems kind-of broken currently) and a set of scripts/command will allow admin to create new dataset, add more fields if needed and list all the existing datasets. This will also allow to have informations specific to the dataset that don't/shouldn't exist for the files...

Thoughts?