Closed tonyhutter closed 3 years ago
Even in post-stage, we need to call the BBAPI to determine whether the transfer succeeded. Ideally, our normal Create/Resume/Wait logic should handle that.
Where do things break down if we don't do this?
You're right, I'll move this block:
rc = axl_check_file_sizes(id);
if (rc == AXL_SUCCESS) {
/* Destination files are already the correct size, we're done */
kvtree_util_set_int(file_list, AXL_KEY_STATUS, AXL_STATUS_DEST);
goto end;
}
...to both axl_sync_resume()
& axl_pthread_resume()
, since in those cases the file size will always tell us if the transfer is complete or not.
Note that for SCR post_stage, we're finalizing the transfer from the login node, which isn't running the BB client, and thus can't check the BB transfer status. To get around this, poststage will do a AXL_XFER_SYNC resume on the old BB transfer, which just does the simple file size check and finalization. This could be problematic if the file size was correct, but the BB status wasn't "BBFULLSUCCESS". I'm not sure how you'd get around this though, short of running the BB client on the login node, or doing some ulgy pdsh call to a BB node to get the transfer status, or doing a CRC check on the file.
A couple of things.
This could be problematic if the file size was correct, but the BB status wasn't "BBFULLSUCCESS".
Yeah, that's exactly the point I'm worried about. There is no guarantee that a correct file size implies the file contents are correct. We have no control over how the BB software actually moves the bytes of a file. They may truncate to the correct size and then fill in the contents, or they may write the file in arbitrary order. Even if they tell us we could use the size in their implementation today, they might drop a software release tomorrow that breaks it. The agreement with IBM was that we'd check the BB transfer status.
Also, the poststage script won't technically run on a login node -- it runs on the job script node. I don't remember if the BBAPI is valid from the job script node, but I think the bbcmd is meant to work from the job script node. Somewhere in the IBM reports, I think they have example poststage scripts that wait on and check the status of a transfer. We'll need to copy what they did. We might have tested that in Ben's work, too.
In a quick search, here is the best example phase 2 stageout script I've seen so far:
https://github.com/IBM/CAST/blob/master/bb/scripts/stageout_user_phase2_bscfs.pl
This queries for the list of transfer handles and checks the status of each one.
Ah good, if the BBAPI is available in post-stage then we probably don't even need this PR. I'll do a quick sanity test.
@tonyhutter , I haven't tried the above bscfs script, but it should be a close fit to what we need. Does a modified version of that work as far as getting the set of transfer handles and the status for each one?
@adammoody ideally I'd rather not do that, as you'd have AXL spawning off a perl script to test the transfer status, which is ugly.
If it turns out we're unable to check the transfer status in poststage, the we could checksum the source files, add the checksums to the state_file, and then verify the checksums in post_stage. Hopefully it doesn't come to that...
I'm finding that bbcmd
may be doing some things outside of the BBAPI to make it work on the post-stage node. I've opened:
Can't connect to bbproxy when calling BB_InitLibrary() in post-stage https://github.com/IBM/CAST/issues/1002
Yeah, I seem to remember that it's not valid to call the BBAPI from the launch nodes. I think the BBAPI is only valid from compute nodes. One has to use bbcmd from the launch nodes. Though I forget why.
@adammoody yea, Tom Gooding just confirmed that in the bug I opened. I asked him if it was sufficient to just check the file sizes and am waiting on an answer. That is, does the destination file size reported by the BBAPI/GPFS always represent valid data, or can the data be wrong if there was an error in the transfer or something.
Let's use bbcmd to check the transfer status. That was the plan we worked up with IBM when we co-designed this all with them.
@adammoody ok we'll spawn bbcmd
and scrap the output.
We can probably close this PR, since we're now spawning off bbcmd
to check the final transfer status when resuming BBAPI transfers.
This changes
AXL_Dispatch()
to first check the destination file sizes before resuming the copy. That way, if the files were already fully transferred at the time of resume, and the source files are gone (as would be the case with a SCR post-stage), then simply finalize the destination files (remove their ._AXL extension, apply metadata, etc).