LLNL / UnifyFS

UnifyFS: A file system for burst buffers
Other
106 stars 31 forks source link

operating on previously deleted file #692

Open roblatham00 opened 3 years ago

roblatham00 commented 3 years ago

System information

Type Version/Name
Operating System Ubuntu
OS Version 21.04
Architecture x86-64
UnifyFS Version -dev

Describe the problem you're observing

Consider a two-process scenario.

  1. Rank 0 deletes a file
  2. Rank 0 creates a file with the same name
  3. Rank 0 and 1 both open the now-existing file
  4. Rank 0 and 1 write some data
  5. Rank 0 and 1 issue OP_SYNC_META
  6. rank 1's operation will fail.

Describe how to reproduce the problem

here's an MPI test case: https://gist.github.com/roblatham00/f4e71fe7c4da2ae3fef3390c1dd44b0b .

If you run this test case as a singelton, all the routines succeed.

If you run this test case with two MPI processes, the log from rank 1 complains about invalid arguments and processing pending global unlink

I guess that "pending" is what you meant by client callback? How do I complete the "pending" unlink?

Include any warning or errors or releveant debugging data

here are the client-side logs for the above two-process test case:


[0] 2021-09-14T09:21:57 tid=25140 @ unifyfs_fid_create_file() [unifyfs_fid.c:220] Filename /unifyfs/nonconting11 got unifyfs fid 1
[0] 2021-09-14T09:21:57 tid=25140 @ unifyfs_set_global_file_meta_from_fid() [unifyfs_fid.c:180] setting global file metadata for fid:1 gfid:2013224502 path:/unifyfs/nonconting11
[0] 2021-09-14T09:21:57 tid=25140 @ unifyfs_file_attr_update() [unifyfs_meta.h:161] updating attributes for gfid=2013224502
[0] 2021-09-14T09:21:57 tid=25140 @ unifyfs_file_attr_update() [unifyfs_meta.h:170] setting mode to 100644
[0] 2021-09-14T09:21:57 tid=25140 @ unifyfs_file_attr_update()[unifyfs_meta.h:191] setting attr.size to 0
[0] 2021-09-14T09:21:57 tid=25140 @ unifyfs_file_attr_update()[unifyfs_meta.h:197] setting attr.atime to 1631629317.764174058
[0] 2021-09-14T09:21:57 tid=25140 @ unifyfs_file_attr_update()[unifyfs_meta.h:207] setting attr.mtime to 1631629317.764174058
[0] 2021-09-14T09:21:57 tid=25140 @ unifyfs_file_attr_update()[unifyfs_meta.h:219] setting attr.ctime to 1631629317.764174058
[0] 2021-09-14T09:21:57 tid=25140 @ unifyfs_file_attr_update()[unifyfs_meta.h:227] setting attr.is_laminated to 0
[0] 2021-09-14T09:21:57 tid=25140 @ unifyfs_file_attr_update()[unifyfs_meta.h:233] setting attr.is_shared to 1
[0] 2021-09-14T09:21:57 tid=25140 @ unifyfs_file_attr_update()[unifyfs_meta.h:238] setting attr.filename to /unifyfs/nonconting11
[0] 2021-09-14T09:21:57 tid=25140 @ unifyfs_set_global_file_meta_from_fid() [unifyfs_fid.c:185] using
following attributes
[0] 2021-09-14T09:21:57 tid=25140 @ debug_print_file_attr() [unifyfs_meta.h:123] fileattr(0x7ffe24812810) - gfid=2013224502
filename=/unifyfs/nonconting11
[0] 2021-09-14T09:21:57 tid=25140 @ debug_print_file_attr() [unifyfs_meta.h:125]              - sz=0 mode=100644 uid=1000 gid=1000
[0] 2021-09-14T09:21:57 tid=25140 @ debug_print_file_attr() [unifyfs_meta.h:127]              - shared=1 laminated=0
[0] 2021-09-14T09:21:57 tid=25140 @ debug_print_file_attr() [unifyfs_meta.h:129]              - atime=1631629317.764174058
ctime=1631629317.764174058 mtime=1631629317.764174058
[0] 2021-09-14T09:21:57 tid=25140 @ invoke_client_metaset_rpc() [margo_client.c:426] invoking the metaset rpc function in client -
gfid:2013224502 file:/unifyfs/nonconting11
[0] 2021-09-14T09:21:57 tid=25140 @ invoke_client_metaset_rpc() [margo_client.c:440] Got response ret=17
[0] 2021-09-14T09:21:57 tid=25140 @ invoke_client_metaget_rpc() [margo_client.c:474] invoking the metaget rpc function in client
[0] 2021-09-14T09:21:57 tid=25140 @ invoke_client_metaget_rpc() [margo_client.c:487] Got response ret=0
[0] 2021-09-14T09:21:57 tid=25140 @ unifyfs_fid_from_path() [unifyfs_fid.c:799] File found: unifyfs_filelist[1].filename =
/unifyfs/nonconting11
[1] 2021-09-14T09:21:57 tid=25141 @ unifyfs_fid_from_path() [unifyfs_fid.c:799] File found: unifyfs_filelist[1].filename =
/unifyfs/nonconting11
[1] 2021-09-14T09:21:57 tid=25141 @ unifyfs_fid_open() [unifyfs_fid.c:402] unifyfs_fid_from_path() gave 1 (gfid = 2013224502)
[0] 2021-09-14T09:21:57 tid=25140 @ unifyfs_fid_open() [unifyfs_fid.c:402] unifyfs_fid_from_path() gave 1 (gfid = 2013224502)
[0] 2021-09-14T09:21:57 tid=25140 @ invoke_client_metaget_rpc() [margo_client.c:474] invoking the metaget rpc function in client
[1] 2021-09-14T09:21:57 tid=25141 @ invoke_client_metaget_rpc() [margo_client.c:474] invoking the metaget rpc function in client
[0] 2021-09-14T09:21:57 tid=25140 @ invoke_client_metaget_rpc() [margo_client.c:487] Got response ret=0
[1] 2021-09-14T09:21:57 tid=25141 @ invoke_client_metaget_rpc() [margo_client.c:487] Got response ret=0
[1] 2021-09-14T09:21:57 tid=25141 @ unifyfs_get_meta_from_fid() [unifyfs_fid.c:667] processing pending global unlink
[0] 2021-09-14T09:21:57 tid=25140 @ unifyfs_logio_write() [../../common/src/unifyfs_logio.c:888] log_off=0, nbytes=4 : mem_sz=0 spill_sz=4 spill_off=0
[0] 2021-09-14T09:21:57 tid=25140 @ fid_logio_write() [unifyfs_fid.c:1164] fid=1 gfid=2013224502 pos=0 - successful logio_write() @ log offset=0 (4 bytes)
[0] 2021-09-14T09:21:57 tid=25140 @ invoke_client_sync_rpc() [margo_client.c:770] invoking the sync rpc function in client
[1] unifyfs_get_meta_from_fid 128 -1
[1] 2021-09-14T09:21:57 tid=25141 @ unifyfs_fid_sync_extents() [unifyfs_fid.c:1076] no filemeta for fid=-1```