LLNL / UnifyFS

UnifyFS: A file system for burst buffers
Other
102 stars 31 forks source link

unlink: optional sleep after calling client-to-server unlink rpc #745

Closed adammoody closed 1 year ago

adammoody commented 1 year ago

In testing PnetCDF, some tests fail when creating a file after deleting a file by the same name (see https://github.com/LLNL/UnifyFS/issues/744). As a work around, this adds an optional sleep immediately after a client calls the client-to-server unlink rpc to give the unlink operation more time to complete before the client returns from its call to unlink().

To enable this option, one can set a new config parameter:

export UNIFYFS_CLIENT_UNLINK_USECS=1000000

For the first test case that was failing, which was a serial program (single-process MPI job), a value of 1000000 (1 second) was sufficient. Higher sleep times may be required for parallel jobs.

This is a hack, but it helps for now.

A better fix would be to implement a mode where the unlink() wrapper blocks at the calling client until all servers have indicated that the unlink operation has completed. That may require a round trip between each server with each of its clients, since each client has to do some work to support unlink. That change will be a more substantial effort, and so it is saved for future work. Once added, this particular work around could be removed.

Description

Motivation and Context

How Has This Been Tested?

Types of changes

Checklist:

adammoody commented 1 year ago

Thanks for taking a look, @CamStan !