ORNL / DataFed2

A federated scientific data management system (redesigned)
0 stars 0 forks source link

Account for ACTIVE faulty state, that can occur in a Globus Transfers #9

Open JoshuaSBrown opened 2 years ago

JoshuaSBrown commented 2 years ago

In DataFed 1.0 the core server task workers are setup so that they will sleep every 5 seconds until a task is considered successful. The actual check is written in the core servers GlobusAPI::checkTransferStatus function call. This function makes a request to the globus server to check whether a task has completed or not. The status of the returned json packet is used to determine success or failure. In the current implementation, a total of 3 statuses are checked "SUCCEEDED", "FAILED" and "INACTIVE", however, there is one other status that can be returned: "ACTIVE". This status is returned in cases where the transfer is ongoing but is not even addressed in the checkTransferStatus call. In DataFed 2.0 we should work to address this issue and return information about current status of the "ACTIVE" state to take further actions if necessary.

JoshuaSBrown commented 2 years ago

A proof of concept branch is shown in DataFed 1.0 here: https://github.com/ORNL/DataFed/blob/JoshuaSBrown-increase-transparency-of-gridftp-errors/core/server/GlobusAPI.cpp#L370-L400

JoshuaSBrown commented 2 years ago

Before accounting for the Active state and addressing error messages stemming from the gridftp modules. This is what we would see on DataFed. Screenshot (247) After wards we get a much more descriptive error message. Screenshot (248)