LLNL / UnifyFS

UnifyFS: A file system for burst buffers
Other
106 stars 31 forks source link

slow client can make unifyfsd assert #668

Open roblatham00 opened 3 years ago

roblatham00 commented 3 years ago

System information

Type Version/Name
Operating System Linux
OS Version Ubuntu 20.04.2
Architecture x86_64
UnifyFS Version git (115e5be54b5)

Describe the problem you're observing

unifyfsd can hit a margo assertion:

[critical] unexpected return code (11: HG_INVALID_ARG) from HG_Progress()
unifyfsd: ../src/margo-core.c:1488: __margo_hg_progress_fn: Assertion `0' failed.
2021-08-12T15:32:17 tid=1387996 @ unifyfs_heartbeat_rpc() [margo_client.c:895] responding

Describe how to reproduce the problem

hook up a debugger to a UNIFY client and slowly walk through the code

Include any warning or errors or releveant debugging data

roblatham00 commented 3 years ago

Oh i guess this is due to commit ea2f5f5e1bb ?

MichaelBrim commented 3 years ago

The core issue seems like a Margo problem to me, but we could make it less likely to occur in Unify by creating a "debugging mode" that either completely disables the client heartbeat rpc or uses a really big timeout.