Closed GoogleCodeExporter closed 9 years ago
} 3. The child process hangs since the lock is copied over.
Can you clarify what lock you're talking about? I take it it's the lock called
by
malloc()?
I don't know much about the semantics of lock inheritence around forking. My
understanding of how to make fork safe to use is that between fork and exec you
can't
a) call any non-reentrant library functions, b) call a function that may
acquire a
lock. That's pretty restrictive. I'm not surprised doing something like
opendir()
there may cause problems. In general, I do know that forking is super-hard to
get
right, so it's not surprising to me that there should be problems like this.
I'm not sure I understand your proposed solution though. It's the client app
that's
calling fork(). How is it supposed to acquire the proper locks, and then
release
them? Are you suggesting we provide an API to do that? It seems pretty
complicated,
if so. Or maybe you had another proposal?
Original comment by csilv...@gmail.com
on 13 Oct 2009 at 9:20
When tcmalloc tries to get a memory chunk from system memory, it acquires a
(global)
lock so that it can exclusively access system memory.
Let's assume there are thread A and B in the parent process, and thread A calls
java_lang_UNIXProcess_forkAndExec while thread B was holding this lock. Inside
of
java_lang_UNIXProcess_forkAndExec, the child process will get a clone of the
parent
thread (A). Since thread B was holding the lock, this state is also inherited
to the
child process. When opendir() is called inside of
java_lang_UNIXProcess_forkAndExec(), the child process gets stuck since there is
nobody that will release the lock in the child process. This is not a problem
for the
parent process since thread B will release it eventually for the parent process.
In the pthread, there is an api called pthread_atfork which provides handles
before/after fork(). My solution was to call this function with proper handlers
when
tcmalloc is initialized. It seems to be working for linux system but I am not
sure if
it can be generalized for other systems.
Original comment by hankyup...@gmail.com
on 14 Oct 2009 at 6:16
Ah, I see. I'm worried, though, that this lock isn't the only one we need to
keep
track of. tcmalloc has a lot of locks, including a pageheap lock, a new-handler
lock, heap-checker locks, and at least one profiler lock. The debug allocator
adds a
few more, and more may be added in the future. It seems like each one of these
could
cause a problem in the situation you describe: thread A holds the lock while
thread B
forks. I guess we'll need to keep track of all of them?
I haven't thought a lot about the fork-safety of tcmalloc. I wonder if there is
anything other than spinlocks that we need to worry about.
It sounds like you have a patch that solves the problem for you? Do you mind
attaching it to this bug report?
Original comment by csilv...@gmail.com
on 14 Oct 2009 at 2:41
I talked to one of the experts here on forking, and he pointed out that it's
actually
correct for both parent and child to have the fork: the system memory is a
shared
resource, and they can't both be accessing it at the same time. The solution
you use
with pthread_atfork doesn't seem safe.
He had this to say in general:
---
Traditionally, POSIX only ever allowed exec() to be called after fork().
Calling any other function after fork() but before exec() was undefined.
Over time, users discovered that on most systems there were a small number
of functions that they could still call in between fork() and exec() despite
POSIX warning them not to do so. I don't think this list of functions has
ever been formally specified, but it is generally believed that any
async-signal-safe function can be called at this time. And I expect most
runtime environments (including glibc) to try hard to actually guarantee
this behavior.
Please note that opendir() and malloc() are most definitely not
async-signal-safe and still cannot be called safely at this time.
For single-threaded applications, you can sometimes get away with calling
more complex functions after fork(), but this has always been error prone
and would often lead to subtle bugs.
A long time later, POSIX grew a threading API. And it is very clearly an
afterthought. All sorts of things don't quite work correctly once more than
one thread exists in the program. Most notably, calling fork() in a
multi-threaded application is poorly defined and surprisingly difficult to
get right.
Glibc tries hard to make this work. And POSIX made an attempt to fix it by
defining pthread_at_fork() handlers. But all of these are bandaid solutions
that don't work well practice. Apart from the problem with locks, there are
a whole slew of other subtle issues with shared resources that a very
difficult to fix and cause all sorts of very subtle bugs. A particularly
nasty problem are all the resources that are shared between threads and
inherited by child processes (e.g. guaranteeing that file descriptors get
passed correctly in all cases is quite tricky). The upshot is:
- don't fork() after creating threads. It doesn't work well. If you know
that your application needs to fork(), then create a helper process as the
very first thing in main(). This helper process should stay single-threaded
and can thus fork() more easily.
- if you cannot use a helper process, then stick to letter of the POSIX
specification. Do not make any function calls other than exec() after
calling fork(). In a multi-threaded application, even the async-signal-safe
functions cannot reliably be called after fork()'ing.
- alternatively, have a look at posix_spawn(). It's API is somewhat
restricted, but it is often a thread-safe alternative to fork() and exec().
I have no idea how good the implementation in glibc is, though.
---
I don't know if you can do any of these for this java_lang_* call, but I think
your
code will be safer in general if you can rewrite it so it doesn't do any work
between
the fork and exec.
I'm going to close this WillNotFix. I think it would be dangerous to do any
unlocking across forks.
Original comment by csilv...@gmail.com
on 14 Oct 2009 at 6:44
What I was dealing with was a (global) lock from tcmalloc, not a
system-provided lock
to get a memory from system. I think that the latter is already handled by OS.
The problem is that I don't have a control over java_lang_* methods which is a
part
of JDK. And this method is called indirectly from my application via an
underneath
library. So I don't have any control on timing, either.
I am attaching my patch, anyway.
diff -crBN ../google-perftools-1.2/Makefile.am google-perftools-1.2/Makefile.am
*** ../google-perftools-1.2/Makefile.am Fri Apr 17 19:37:09 2009
--- google-perftools-1.2/Makefile.am Fri Oct 2 04:45:18 2009
***************
*** 309,314 ****
--- 309,315 ----
src/page_heap_allocator.h \
src/span.h \
src/static_vars.h \
+ src/fork_handler.h \
src/thread_cache.h \
src/stack_trace_table.h \
src/base/thread_annotations.h \
***************
*** 338,343 ****
--- 339,345 ----
src/span.cc \
src/stack_trace_table.cc \
src/static_vars.cc \
+ src/fork_handler.cc \
src/thread_cache.cc \
src/malloc_hook.cc \
src/malloc_extension.cc \
diff -crBN ../google-perftools-1.2/src/fork_handler.cc
google-perftools-1.2/src/fork_handler.cc
*** ../google-perftools-1.2/src/fork_handler.cc Thu Jan 1 00:00:00 1970
--- google-perftools-1.2/src/fork_handler.cc Fri Oct 2 18:51:04 2009
***************
*** 0 ****
--- 1,24 ----
+ #include "fork_handler.h"
+ #include <pthread.h>
+
+ namespace tcmalloc {
+
+ void prepare()
+ {
+ Static::pageheap_lock()->Lock();
+ }
+ void fork_parent()
+ {
+ Static::pageheap_lock()->Unlock();
+ }
+ void fork_child()
+ {
+ Static::pageheap_lock()->Unlock();
+ }
+
+ void Init_forkHandler()
+ {
+ pthread_atfork(prepare, fork_parent, fork_child);
+ }
+
+ }
diff -crBN ../google-perftools-1.2/src/fork_handler.h
google-perftools-1.2/src/fork_handler.h
*** ../google-perftools-1.2/src/fork_handler.h Thu Jan 1 00:00:00 1970
--- google-perftools-1.2/src/fork_handler.h Fri Oct 2 04:45:12 2009
***************
*** 0 ****
--- 1,12 ----
+ #ifndef TCMALLOC_FORK_HANDLERS_H_
+ #define TCMALLOC_FORK_HANDLERS_H_
+
+ #include "static_vars.h"
+
+ namespace tcmalloc {
+
+ void Init_forkHandler();
+
+ }
+
+ #endif
diff -crBN ../google-perftools-1.2/src/static_vars.cc
google-perftools-1.2/src/static_vars.cc
*** ../google-perftools-1.2/src/static_vars.cc Thu Feb 26 19:16:18 2009
--- google-perftools-1.2/src/static_vars.cc Fri Oct 2 04:46:11 2009
***************
*** 32,37 ****
--- 32,38 ----
#include "static_vars.h"
#include "sampler.h" // for the init function
+ #include "fork_handler.h"
namespace tcmalloc {
***************
*** 60,65 ****
--- 61,67 ----
new ((void*)pageheap_memory_) PageHeap;
DLL_Init(&sampled_objects_);
Sampler::InitStatics();
+ Init_forkHandler();
}
} // namespace tcmalloc
Original comment by hankyup...@gmail.com
on 14 Oct 2009 at 7:29
Original issue reported on code.google.com by
hankyup...@gmail.com
on 5 Oct 2009 at 6:04