ORNL / DataFed

A Federated Scientific Data Management System
https://ornl.github.io/DataFed/
Other
18 stars 13 forks source link

Repo Server Health #933

Open JoshuaSBrown opened 5 months ago

JoshuaSBrown commented 5 months ago

Description

This issue is for discussing problems related to the health of the DataFed repository service.

It was discovered that occasionally the repo service may fail. At this point, tasks will accumulate on the core server backing up the queuing system. In the case of transfers this leads to a few symptoms. i.e. Globus transfers completing but in DataFed they are shown as incomplete because they are reliant on the repository server to do a few final items. Currently, no messaging is going back to the users to indicate what the problem is. It needs to be made clear from the users perspective that the repository if offline. Particularly when it comes to steps in a task. I.e. tasks that have multiple steps should show what step they have completed.

Another option that could be used to indicate a problem with the repo server is a yellow icon next to files that have allocations on the repository.