LLNL / UnifyFS

UnifyFS: A file system for burst buffers
Other
105 stars 31 forks source link

ECP Annual Meeting Feedback #462

Open MichaelBrim opened 4 years ago

MichaelBrim commented 4 years ago

Please add your notes/feedback from the tutorial and other meetings.

MichaelBrim commented 4 years ago

Tutorial: Q1: does job/RM integration place server on reserved cores? Q2: can a client test whether a file is laminated yet? Q3: are you using MPI file utils for parallel transfers? Q4: is there a way to control the blocksize and number of processes doing I/O during parallel transfers? Q5: for unifyfs command-line tool stage-in/stage-out options, can we support a manifest file rather than just a single source/destination directory?

HDF5: Q1: any interest in a threading worker pool library for async tasks? Q2: what's the status of MPI-IO support?

adammoody commented 4 years ago

Tutorial:

HDF5:

craigsteffen commented 4 years ago

Additionally from tutorial Q1 from Mike above: If the Unify server gets started on the OS's behalf on all the computer nodes, does it fall into the OS-reserved pool of cores, or does it land on a user-allocated core? Is that the right place? If not, what do we do about it?

My note RE Mike's tutorial Q4 is that we DO want to make a way to make the transfer API block size configurable. As far as number of cores participating, we can now choose "parallel" which uses all of them, or "serial" which uses one of them; I guess this directive is to also add an option to select somewhere between 1 and all. This will become increasingly important as we start to run Unify in larger jobs.

As to the answer to "do we remove files" question; I believe we do not. User will have to remove using "rm" (or from the C side, perhaps "unlink()"?). Should we provide an mv-like transfer (or an option to the transfer API) that removes the file once it's verified at the destination?

clmendes commented 4 years ago

These are some of my notes about the contents of the slides from the Unify Tutorial on 2/5/2020:

Slide-17: I believe the correct build commands are "cd UnifyFS" and "./bootstrap.sh"

Slide-18: Don't we need to run "./autogen.sh" before "configure"? That's what is indicated by the output from the bootstrap script.

Slide-20: Don't we need to mount /unifyfs? e.g. using ret = unifyfs_mount("/unifyfs", rank, total_ranks, 0); and unmount at the clean-up, with unifyfs_unmount();

Slide-22: For dynamic linking, at least on Catalyst, just using the -lunifyfs_gotcha option was not sufficient for me. It was necessary to add another library from lib64: mpicc -o hello hello.c -L/lib -lunifyfs_gotcha /lib64/libgotcha.so