JasonAlt / GridFTP-DSI-for-HPSS

GridFTP module that allows the Globus server to work with HPSS
Other
6 stars 2 forks source link

HPSS: Support resumption of interrupted transfers #44

Closed JasonAlt closed 4 years ago

JasonAlt commented 5 years ago

For standard DSIs, when files are transferred (STOR) in extended block mode, the receiving end will send restart markers at points where (1) the data received safely resides on persistent media and (2) the underlying storage technology is capable of resuming writes at that offset. In the POSIX world, the receiving end could (hypothetically) send restart markers after every buffer. In the case of an error, the client could resume the transfer at the latest received restart marker.

In HPSS, simply writing a buffer to HPSS with PIO is not sufficient to satisfy the two points above. The restart point within a transfer is determined by the complex inter-workings of HPSS. The only reliable point of restart is returned by hpss_PIOExecute(). However, there was a HPSS bug in 7.4.3, BZ4719, PIOExecute() returns the wrong value for bytesmoved on error in pre 7.5:

We need a small fix to hpss_PIOExecute() in order to support transfer restarts on error while writing to HPSS. When hpss_PIOExecute() exits with error, we need BytesMoved to be set. Currently, the values of bytes_moved (local variable) is computed for error or success but the value is not returned to the caller on error. If we had this value, GridFTP could send a restart marker and transfers would resume where they left off.

Currently, site's must disable REST within the gridftp configuration files in order to avoid using a restart marker which could be erroneous. The result could be a file received with a 'gap'. If post transfer checksums are not enabled, this could go undetected. With checksums enabled, the entire file would transfer, compute checksum, detect a corrupt file and restart. For large files, this is quite painful.

This work is part of globusonline/product-management#388

JasonAlt commented 5 years ago

Tracking requirements necessary for providing restart functionality:

JasonAlt commented 4 years ago

Fixed in PR #68