LLNL / UnifyFS

UnifyFS: A file system for burst buffers
Other
105 stars 31 forks source link

gotcha failing to wrap stat() #318

Open adammoody opened 5 years ago

adammoody commented 5 years ago

Need to investigate further, but gotcha seems to fail when wrapping stat() on TOSS and LSF. This can lead to a segfault when the user application calls stat.

CamStan commented 5 years ago

David P. helped me debug something some time ago and I still have one of his emails. Before running the example you're trying to debug, set the environment variable GOTCHA_DEBUG=3.

A quick glance after doing this definitely says there's an issue with stat:

$ export GOTCHA_DEBUG=3
$ srun -N1 -n1 sysio-write-gotcha 2>&1
[gotcha_utils.c:58] - Gotcha debug initialized at level 3
⋮
7: stat will map to 0x2aaaaacee6e4
⋮
[gotcha.c:171] - Returning GOTCHA_FUNCTION_NOT_FOUND from gotcha_wrapbecause of entry 7  <---
[gotcha.c:176] - Returning code 1 from gotcha_wrap
⋮

There's probably more useful info in there, but that's what I'm seeing at first glance.

tonyhutter commented 5 years ago

I dunno if this is relevant, but stat() is weird in that it may be externed as stat() or inlined to __xstat() in /usr/include/sys/stat.h:

#ifndef __USE_FILE_OFFSET64
/* Get file attributes for FILE and put them in BUF.  */
extern int stat (const char *__restrict __file,
         struct stat *__restrict __buf) __THROW __nonnull ((1, 2));
#if defined __GNUC__ && __GNUC__ >= 2 && defined __USE_EXTERN_INLINES
/* Inlined versions of the real stat and mknod functions.  */

__extern_inline int
__NTH (stat (const char *__path, struct stat *__statbuf))
{
  return __xstat (_STAT_VER, __path, __statbuf);
}
clmendes commented 5 years ago

I am probably hitting this issue when running "transfer-gotcha" on Catalyst, i.e. I get a segfault. When the applications starts, the first messages I see in stderr are:

@ unifyfs_init() [unifyfs.c:1797] gotcha_wrap returned 1 @ unifyfs_init() [unifyfs.c:1804] This function name failed to be wrapped: stat

KoyamaSohei commented 1 year ago

I got around this problem by replacing stat/fstat with xstat/fxstat in the application code.