Various fixes and improvements to our support for serial and parallel stage-in/out. Primarily, this reimplements unifyfs-stage helper program to use the library API rather than wrapped POSIX I/O. A new unifyfs_transfer_result structure has been added that provides additional information including the transferred file size in bytes and the transfer time in seconds. For stage-in, there is initial support for specifying the data distribution to use - 'balanced' placement evenly divides the file data across servers in 16MiB transfer chunks, while 'skewed' would allow for uneven data placement. Currently, only 'balanced' placement is supported for stage-in.
A summary of changes to various components follows:
Client library:
POSIX wrappers should set errno=0 on success
Library API:
rename unifyfs_status to unifyfs_file_status
rename unifyfs_ioreq_state to unifyfs_req_state
add transfer-specific result structure unifyfs_transfer_result that includes size of transferred file and elapsed transfer time
expose is_unifyfs_path() utility function
Examples & Tests:
updates for API changes
unifyfsd Server:
fix bug when API transfer initiated from non-owner
ENOENT is ok for non-owner sm_transfer/truncate()
support new transfer API result fields (size, time)
enable real ULT concurrency for broadcast progress
fix case where transfer broadcast collective progressed redundantly
unifyfs-stage helper program:
fix path to executable in unifyfs utility, and use separate status files for stage-in/out
use -S|--status-file for status file path rather than passing share directory
new implementation that uses API instead of I/O interposition
initial support for specifying stage-in data placement policy (balanced | skewed)
improve verbose messages
Motivation and Context
Addresses issue #686 and other user-reported problems with file staging support.
How Has This Been Tested?
Tested in serial and parallel transfer modes using OLCF Summit on up to 64 nodes, with a wide range of manifest files for stage-in/out. The manifest files contained up to 32 files and a wide variety of file sizes.
Types of changes
[x] Bug fix (non-breaking change which fixes an issue)
[x] New feature (non-breaking change which adds functionality)
[x] Performance enhancement (non-breaking change which improves efficiency)
[ ] Code cleanup (non-breaking change which makes code smaller or more readable)
[x] Breaking change (fix or feature that would cause existing functionality to change)
[x] Testing (addition of new tests or update to current tests)
[x] Documentation (a change to man pages or other documentation)
Checklist:
[x] My code follows the UnifyFS code style requirements.
Description
Various fixes and improvements to our support for serial and parallel stage-in/out. Primarily, this reimplements
unifyfs-stage
helper program to use the library API rather than wrapped POSIX I/O. A newunifyfs_transfer_result
structure has been added that provides additional information including the transferred file size in bytes and the transfer time in seconds. For stage-in, there is initial support for specifying the data distribution to use - 'balanced' placement evenly divides the file data across servers in 16MiB transfer chunks, while 'skewed' would allow for uneven data placement. Currently, only 'balanced' placement is supported for stage-in.A summary of changes to various components follows:
Client library:
Library API:
unifyfs_status
tounifyfs_file_status
unifyfs_ioreq_state
tounifyfs_req_state
unifyfs_transfer_result
that includes size of transferred file and elapsed transfer timeis_unifyfs_path()
utility functionExamples & Tests:
unifyfsd
Server:sm_transfer/truncate()
unifyfs-stage
helper program:unifyfs
utility, and use separate status files for stage-in/out-S|--status-file
for status file path rather than passing share directoryMotivation and Context
Addresses issue #686 and other user-reported problems with file staging support.
How Has This Been Tested?
Tested in serial and parallel transfer modes using OLCF Summit on up to 64 nodes, with a wide range of manifest files for stage-in/out. The manifest files contained up to 32 files and a wide variety of file sizes.
Types of changes
Checklist: