flux-framework / dyad

DYAD: DYnamic and Asynchronous Data Streamliner
GNU Lesser General Public License v3.0
7 stars 5 forks source link

All fix #108

Closed JaeseungYeom closed 7 months ago

JaeseungYeom commented 7 months ago

merge PR #93 and #107

Fixing each of the problems listed below individually does not let the CI test pass. CI passes only when all the problems are fixed.

• PR #93 o When built for UCX_RMA, the initialization does not find UCX at all. The CMake variable to enable UCX_RMA was not enabling any UCX dependency as well as UCX related initialization.

• PR #100 (depends on #89) o This allows users to specify their expectation of a relative path to be interpreted as relative to either of managed paths. If a user sets the environment variable DYAD_PATH_RELATIVE and uses a relative path, the canonical prefix check will be bypassed.

• PR #89 o Cache the result of realpath call in ctx o use the cached realpath in calling cmp_canonical_path_prefix() o reorder the comparisons between managed path prefix and path

• PR# 102 (depends on PR #100 and #88) o A producer would lock a file at open to avoid the file being read by a consumer who has access to the same storage before it completes writing. This was added as an optimization to bypass KVS-based synchronization. However, there was no check to see if the file was under the managed directory. As a result, network-related device files were locked possibly resulting in performance degradation or errors.

• Note: The C API interfacing with Python API dyad_get_metadata(file) returns DYAD_RC_UNTRACKED when the file is not under a managed path. On the other hand, C interface returns DYAD_RC_OK so that open()/fopen() can proceed as usual if the file is not supposed to be managed.

• PR #104 (needs further fixes as in PR #107) o We do not want DYAD interception in the middle of interception as it is not needed and only adds overhead. o If we block re-entrance into DYAD region to early, a legitimate access to a file under a managed directory can also be not intercepted. If block too late, then we incur needless cost for irrelevant accesses. o The right position to start blocking re-entrance is when we know the file is under a managed directory. o This is a little tricky for python API because dyad_get_metadata is separated from consume call.

• PR #106 (depends on PR #100, #102 and #104) o Fixing C++ API bug. As @hariharan-devarajan pointed out, c'tor and d'tor of c++ dyad stream initializes and finalizes. As a result, ep-caching becomes stale. In this PR, a persistent singleton ctx object is initialized only once and reused unless explicitly reinitialized. o We might need a module callback to invalidate certain ep cache.

• PR #107 (depends on PR #106) o This PR check the managed path prefix before locking any file. When a match is found, it sets the reenter flag to false to avoid dyad goes into recursive interception.

hariharan-devarajan commented 7 months ago

@JaeseungYeom Can u explain what were the main changes?