Closed battleblow closed 5 years ago
@battleblow Most likely the root cause can be traced back into the signal handling in the src/hotspot/os_cpu/bsd_<cpu>/os_bsd_<cpu>.cpp
file. Roughly speaking, the jdk puts two levels of guard pages the the end of the stack (yellow zone, red zone), when a java program hits the end of the stack, the signal handling in that file it unprotects the yellow zone converts it into java.lang.StackOverflowError. You may want to try reverting my recent changes to the file to rule that out. Also perhaps the large page support may be related so a test with that reverted too may help rule that out as well.
Agreed. I've been looking into the stack and signal handling code a little, but the confirmation is useful. Thank you.
The original report was actually for OpenJDK 8u144, I just wanted to verify it on 11 and found the same problem was still occurring. So at this point my thought is that it is something that has carried through for longer rather than a recent change. I will check out the changes you mention though.
Working through the code block starting at
The signal is caught, but fails the if (thread->in_stack_yellow_reserved_zone(addr)) {
check, meaning that the stack overflow does not get triggered.
Confirming the above. Adding || true (as a temporary test of the hypothesis) does result in a StackOverflowError
. So it looks like the detection of the address being within the yellow zone is what is broken on FreeBSD
Yellow zone check is a simple check that the address is less than the reserved base address but higher than the red zone base address. Kurt's comments above and the diagram in thread.hpp is useful in visualising this.
We're failing the first part of that check, i.e. the address is higher than the reserved base address. Basically, according to the JVM, we haven't yet dropped out of the standard thread stack and into the guard area. However, FreeBSD does believe we're accessing an address which is outside of the stack (hence the SEGV). Possible cause is a misalignment between the pthread idea of the stack and the JVM idea of it. That seems most likely. Less likely, we have the address wrong?
Note that the address is 4080 above the reserved base. This is suspiciously (to me) just less than 4K above it.
Suspicious because this machine has a page size of 4K. This is also the default size of the guard region in the FreeBSD pthread library. So my suspicion is that we're misaligned in that FreeBSD is using it's default guard region, which the JVM doesn't expect since it believes it is managing the guard region. So we grow into the FreeBSD guard region, a SIGSEGV is thrown and the JVM concludes that we're not in the guard area yet and doesn't convert it to a stack overflow.
Potentially relevant links in the Linux and AIX code:
https://github.com/battleblow/openjdk-jdk11u/blob/master/src/hotspot/os/linux/os_linux.cpp#L817 (read through to the call to pthread_attr_setguardsize())
https://github.com/battleblow/openjdk-jdk11u/blob/master/src/hotspot/os/aix/os_aix.cpp#L900
Trying to use the AIX code, i.e. setting the guard size to zero for java and compiler threads, doesn't appear to make any difference. Note that the stack overflow check does check that the thread is a Java thread, so it is true that the thread involved here is a Java thread. The hs_err file also confirms this.
Interestingly, after doing some debugging, the address of the thread being examined is not one being created in os_bsd.cpp. It is the main thread created separately in
However, this clearly has a guard size set to zero.
A couple of other notes:
Inspected the code for other OSes, and the stack overflow detection logic is the same for Linux, AIX, and Solaris. So far, only FreeBSD appears to suffer from this problem and I can't find anything about the FreeBSD logic for stack sizing and set up which is different though.
Consistently the address is within a page size of the guard boundary. Will try printing out similar information on OpenBSD and see what it looks like.
For the internal JDK test suite, this looks like it is the root cause of failures in the following tests:
This could be a bug with pthread_attr_get_np
. If that is not returning accurate info, the jvm will place the guard pages at the wrong locations and may result in the behavior you have described.
https://github.com/battleblow/openjdk-jdk11u/blob/bsd-port/src/hotspot/os/bsd/os_bsd.cpp#L4341
Agreed. Comparing the math between the JVM and the FreeBSD pthread implementation is proving to be a little difficult, at least for me. Thanks for pointing out that section though, I'll add some additional trace output in that location and see if that brings some clarity.
Here is an ascii picture of the stack of the crashing thread:
JVM thread stack
JVM Guard pages = JVM reserved zone + JVM yellow zone + JVM red zone (Total size: 16384)
-- <- Guard page bottom (libthr)
|
| Libthr guard pages (4096)
|
-- <- Stack bottom (libthr) (0x7fffdedee000|140736932536320) (Total size: 1048576)
|
| JVM Red pages (Size: 4096)
|
-- <- Stack red zone base (0x7fffdedef000|140736932540416)
|
| JVM Yellow pages (Size: 8192)
|
-- <- Stack yellow zone base (0x7fffdedf0000|140736932544512)
|
| JVM Reserved pages (Size: 4096)
|
-- <- Stack reserved zone base (0x7fffdedf2000|140736932552704)
|
| Fault address (0x7fffdedf2ff8|140736932556792) (4088 outside the reserved zone)
|
| Untouched memory + shadow zone + stack frames
|
-- <- Stack base (0x7fffdeeee000|140736933584896)
This matches predominantly the diagram in thread.hpp, which means it ends up being numerically reversed.
The key observation is that it all matches up. The stack size from libthr matches the difference in addresses between the stack address (bottom) from libthr and the stack base (top) from the JVM. Although this shouldn't be surprising since that is how the JVM calculates the stack base, IIRC. The JVM guard sizes and addresses also all match up (although the base address accessor methods in thread.hpp
just do match based on the size of the relevant zones, so this doesn't say a lot).
So, despite not being able to find an error in the math, and the numbers seeming reasonable (stack size is 1M, addresses are page aligned, sizes are page multiples), the fault address is clearly outside the JVM guard zone (by less than a page).
I'm unsure how to handle this. I could, on FreeBSD, round the address up to the nearest page boundary (there are already utility methods to do this). This might be a hack though, since I don't have a deep enough understanding of how this works to be confident that it is a warranted step. Even so, it might be worthwhile for now while I dig into it deeper. I'd like to see the address on OpenBSD and see whether it is page aligned, since that would give me more confidence in such a change if it is, but my OpenBSD VM is causing some issues right now. Maybe I'll give that another try tomorrow.
Unsurprisingly this works. I.e. modifying the address as follows:
#ifdef FreeBSD
addr = align_down(addr, os::vm_page_size());
#endif
Results in stack overflow being thrown correctly on FreeBSD. The question is whether this is a reasonable thing to do.
Nice analysis. I'm still on vacation so I can only make a guess right now. From your description I'm assuming that the address for Stack reserved zone base would be the last address that would not fault and Stack reserved zone base - 1 would. Based on what you have described it appears that an extra page has been mprotected PROT_NONE; the page at address for Stack reserved zone base. Can you do a syscall trace on the the test program and review the mprotect(2) calls on the stack?
That's a good hypothesis. I really should have included mprotect(2) calls in what I was looking at.
I need to look at this again in the morning, but it seems ok at the moment.
The overflowing stack has a bottom address of 0x7fffdedf7000. The following look to be the relevant mprotect calls:
2325 100356: 1.521600108 mprotect(0x7fffdeef7000,4096,PROT_NONE) = 0 (0x0)
2325 101377: 1.597886795 mprotect(0x7fffdedf7000,16384,PROT_NONE) = 0 (0x0)
2325 101377: 1.605237197 mprotect(0x7fffdedf8000,12288,PROT_READ|PROT_WRITE) = 0 (0x0)
2325 101377: 1.605704001 mprotect(0x7fffdedf8000,12288,PROT_NONE) = 0 (0x0)
which looks to be:
At this point I don't see any other mprotect calls for this stack. The fault address in this case is 0x7fffdedfbfd8 and the stack base (top) is 0x7fffdeef7000.
I looked this through again today, and these appear to be the only relevant mprotect calls. They all appear to only cover the JVM guard pages, which makes sense. I don't know why there are two initial calls (one for the red zone and then one for the whole guard zone). That seems like the first call is a waste, but it shouldn't cause issues nevertheless.
I'm going to post my findings to one of the FreeBSD mailing lists and see if anyone has any insight. I'll likely patch the ports tree with the hack above during that though.
I wrote a standalone test program and believe I identified the root cause. Manually setting the guard pages on FreeBSD now results in what appears to be an automatic additional guard page being inserted by mmap, or perhaps it doesn't allow a MAP_STACK region to be adjacent to a !MAP_STACK region. I haven't dug into the FreeBSD mmap code to figure that part out.
Here is the standalone test program:
build with clang -g -O0 bsd_stack_test.c -pthread
and debug as follows:
(gdb) handle SIGSEGV nostop noprint
Signal Stop Print Pass to program Description
SIGSEGV No No Yes Segmentation fault
(gdb) b 91
Breakpoint 1 at 0x21058c: file bsd_stack_test.c, line 92.
(gdb) r
Starting program: /usr/home/ec2-user/stack/a.out
[New LWP 100487 of process 52800]
stack bottom: 0xffffbfdfe000
stack guard: 0xffffbfe02000
stack top: 0xffffbfffe000
SIGSEGV addr: 0xffffbfe02fff
expecting no SIGSEGV. stack guard failed
[Switching to LWP 100487 of process 52800]
Thread 2 hit Breakpoint 1, thread_start (arg=0x0) at bsd_stack_test.c:92
92 if (check_bounds(bottom, top)) {
(gdb) shell
[ec2-user@freebsd ~/stack]$ procstat -v 52800
PID START END PRT RES PRES REF SHD FLAG TP PATH
<snip non-stack regions>
52800 0xffffbfdfe000 0xffffbfe02000 --- 0 0 0 0 ----- --
52800 0xffffbfe02000 0xffffbfe03000 --- 0 0 0 0 ----- --
52800 0xffffbfe03000 0xffffbfe1e000 rw- 27 27 1 0 ---D- df
52800 0xffffbfe1e000 0xffffbfe3e000 rw- 32 32 1 0 ---D- df
52800 0xffffbfe3e000 0xffffbfe5e000 rw- 32 32 1 0 ---D- df
52800 0xffffbfe5e000 0xffffbfe7e000 rw- 32 32 1 0 ---D- df
52800 0xffffbfe7e000 0xffffbfe9e000 rw- 32 32 1 0 ---D- df
52800 0xffffbfe9e000 0xffffbfebe000 rw- 32 32 1 0 ---D- df
52800 0xffffbfebe000 0xffffbfede000 rw- 32 32 1 0 ---D- df
52800 0xffffbfede000 0xffffbfefe000 rw- 32 32 1 0 ---D- df
52800 0xffffbfefe000 0xffffbff1e000 rw- 32 32 1 0 ---D- df
52800 0xffffbff1e000 0xffffbff3e000 rw- 32 32 1 0 ---D- df
52800 0xffffbff3e000 0xffffbff5e000 rw- 32 32 1 0 ---D- df
52800 0xffffbff5e000 0xffffbff7e000 rw- 32 32 1 0 ---D- df
52800 0xffffbff7e000 0xffffbff9e000 rw- 32 32 1 0 ---D- df
52800 0xffffbff9e000 0xffffbffbe000 rw- 32 32 1 0 ---D- df
52800 0xffffbffbe000 0xffffbffde000 rw- 32 32 1 0 ---D- df
52800 0xffffbffde000 0xffffbfffe000 rw- 32 32 1 0 ---D- df
52800 0xffffbfffe000 0xffffbffff000 --- 0 0 0 0 ----- --
52800 0xffffbffff000 0xfffffffdf000 --- 0 0 0 0 ----- --
52800 0xfffffffdf000 0xfffffffff000 rw- 3 3 1 0 ---D- df
52800 0xfffffffff000 0x1000000000000 r-x 1 1 20 0 ----- ph
The above shows that there is a page at 0xffffbfe02000 that is mapped PROT_NONE, but the program didn't modify that stack page. The jvm is expecting the stack to be able to grow into that page and hit our manual stack guard region. This manual guarding technique used to work on FreeBSD, but now does not.
Perhaps you could raise this issue with FreeBSD kernel devs. I have not found a method to get around this issue.
Here's a possible work-around that fixes the InfinteRecursion test case but needs more testing:
diff --git a/src/hotspot/os/bsd/os_bsd.cpp b/src/hotspot/os/bsd/os_bsd.cpp
index 87e8b7a20d..9eea54524f 100644
--- a/src/hotspot/os/bsd/os_bsd.cpp
+++ b/src/hotspot/os/bsd/os_bsd.cpp
@@ -2239,6 +2239,11 @@ bool os::pd_uncommit_memory(char* addr, size_t size) {
}
bool os::pd_create_stack_guard_pages(char* addr, size_t size) {
+#ifdef __FreeBSD__
+ // XXX hack to account for automatic PROT_NONE page at end of stack
+ if (size > Bsd::page_size())
+ size -= Bsd::page_size();
+#endif
return os::commit_memory(addr, size, !ExecMem);
}
I noticed this code:
https://github.com/freebsd/freebsd/blob/master/lib/libthr/thread/thr_stack.c#L276
which just does a mprotect()
over the bottom of MAP_STACK memory which I found didn't work in my testing. I adjusted the test program to demonstrate that pthread_attr_setguardsize()
is also not working correctly on FreeBSD/aarch64 13-current at least.
FWIW, here is what that test program reports on my FreeBSD 11.3/amd64 box:
> ./a.out
stack bottom: 0x7fffdfdfe000
stack top: 0x7fffdfffe000
SIGSEGV addr: 0x7fffdfdfefff
expecting SIGSEGV between 0x7fffdfdfd000 and 0x7fffdfdfe000. pthread_attr_setguardsize failed
pthread_attr_setguardsize failed
When I was testing this earlier I tried testing with a change taken from the AIX code to set the pthread guard size to zero (since the guard is unused by the JVM) and it made no difference. I also tried different stack sizes via -Xss (which only affects Java threads) and that also made no difference.
Here is a test Java program that I've also been using to actually test this in a spawned thread rather than the "main" thread:
public
class ThreadInfiniteRecursion {
public static
void main(String[] args) {
Thread t = new Thread(new RecursiveThread());
t.start();
try {
t.join();
}
catch (Exception e) {
e.printStackTrace();
}
}
private static
class RecursiveThread
implements Runnable {
public
void run() {
run();
}
}
}
I created the pthread_attr_setguardsize_test.c program to demonstrate that mprotect()
on top of MAP_STACK pages doesn't work as expected. On aarch64 it segfaulted below the bottom, but not directly below the bottom of the stack - just on the page at bottom - 16k which I assumed is the kernel placed guard for the MAP_STACK region:
[ec2-user@freebsd ~/stack]$ ./a.out
stack bottom: 0xffffbfdfe000
stack top: 0xffffbfffe000
SIGSEGV addr: 0xffffbfdfafff
expecting SIGSEGV between 0xffffbfdfd000 and 0xffffbfdfe000. pthread_attr_setguardsize failed
pthread_attr_setguardsize failed
I'm surprised by your amd64 results which showed it segfaulted on the page at the bottom of the stack. I expected the pthread guard area to be 16K below bottom, how it segfaulted at the bottom page is strange. Could you change just the size to 0 in the pthread_attr_setguardsize()
call and show the results on amd64?
Reading the kernel code I see that it does indeed set a separate guard page for MAP_STACK regions which is controlled by the sysctl security.bsd.stack_guard_page
:
https://github.com/freebsd/freebsd/blob/master/sys/vm/vm_map.c#L4102
There seems to be a series of bugs related to that functionality:
1) mprotect(PROT_NONE)
on top of MAP_STACK pages not yet touched by the kernel fails to set the pages as PROT_NONE.
2) setting pthread_attr_setguardsize()
size to < security.bsd.stack_guard_page
number of pages results in pthread_attr_getstack()
returning a stack bottom that is PROT_NONE due to kernel setting security.bsd.stack_guard_page
number of pages as guard pages.
3) setting pthread_attr_setguardsize()
size > security.bsd.stack_guard_page
number of pages doesn't work correctly as it relies on 1 which doesn't work. The result is a stack that can access the guard pages (except the bottom security.bsd.stack_guard_page
number of pages).
I haven't verified all of these yet but from my current understanding of the code I think these issues exist. I'll do some additional testing to verify they are indeed problems.
edit: fix code ticks
Confirmed: Using the test pthread_attr_setguardsize_test program from the previous comment:
pthread_attr_setguardsize_test.c
Results when setting security.bsd.stack_guard_page
before running the test:
sudo sysctl -w security.bsd.stack_guard_page=4
test passes
sudo sysctl -w security.bsd.stack_guard_page=1
test fails with stack < bottom accessible
sudo sysctl -w security.bsd.stack_guard_page=5
test fails with page at bottom not accessible
FWIW, here is the output of the test program with setguardsize called with 0. It sounds like you've already got the root cause though :)
> ./a.out
stack bottom: 0x7fffdfdfe000
stack top: 0x7fffdfffe000
SIGSEGV addr: 0x7fffdfdfefff
expecting SIGSEGV between 0x7fffdfdfd000 and 0x7fffdfdfe000. pthread_attr_setguardsize failed
pthread_attr_setguardsize failed
I guess one key of this is that both of the hacks only work with the default setting of the sysctl. So we'll definitely need a more substantive change to take account of it.
I think this is the reference for the sysctl addition:
https://svnweb.freebsd.org/base?view=revision&revision=215307
Thanks. That matches what I expected.
I think the issues I outlined above need to addressed by FreeBSD devs before a proper solution can be implemented in the jvm. At a minimum some work is needed to make pthreads return proper results from pthread_attr_getstack()
and pthread_attr_setguardsize()
fixed so that it can set the number of guard pages correctly after the stack has been allocated using MAP_STACK. Its pretty important for userland to be able to place its own guard pages (in addition to the any kernel/pthreads ones) and remove them so that techniques like the jvm uses work.
Here's an ASCII picture of what I think might work well:
==============
-- <- stack top (high address)
|
| MAP_STACK pages
|
-- <- stack bottom (low address)
|
| MAP_STACK pages set PROT_NONE by pthreads based on `pthread_attr_setguardsize()`
|
-- <- stack bottom - guard size
|
| kernel placed guard pages based on `sysctl security.bsd.stack_guard_page`
|
-- <- end of stack region
This differs from what is currently implemented in two ways:
1) Currently if the kernel receives an mmap MAP_STACK for size X
, it returns back to userland a usable stack of size X - sysctl security.bsd.stack_guard_page
. This is problematic as I don't believe it possible for pthreads to account for the kernel placed stack guard pages without race conditions. I also believe it breaks the expectation from user of mmap that the kernel returns a stack of size X that is usable. By ensuring that kernel placed guard pages are in addition to the requested size it allows pthreads to ignore them and always return proper values for pthread_attr_getstack()
2) Currently the kernel doesn't allow userland to place its own guard pages (unless the page has already been grown into by the application). This is problematic for both pthreads and userland programs that place their own guard pages like the jvm. Ideally the kernel would respect an mprotect(PROT_NONE)
call on pages of the stack that have have not yet been grown into. To be clear the jvm needs to be able to place its own PROT_NONE pages on the stack and change some them back to PROT_READ|PROT_WRITE to allow further stack growth into that space (to allow exceptions to unwind), and then set them back to PROT_NONE again later.
With those two changes both pthreads and the jvm could ignore any kernel placed guard pages and place their own guard pages using mprotect(PROT_NONE)
. pthreads would allocate the stack using mmap with MAP_STACK flag and size of pthread_attr_setstacksize + pthread_attr_setguardsize
, then place its own guard pages with mprotect(PROT_NONE)
. The values returned from pthread_attr_getstack()
would be straightforward to calculate. Actually I think there would be no change to pthreads code as this is what it currently doing. Both of the above changes are kernel changes.
Hopefully this bug report is helpful to FreeBSD kernel developers on what changes would make the pthreads stack guard pages, pthread_attr_getstack()
results and the jvm stack exceeded exception handling working well.
maybe of interest, no idea: Infinityisfinite.txt
@battleblow The two workarounds we have don't allow the page that faulted to be made read/write because it is a kernel guard page. This prevents the jvm from using the the reserved area of the stack after the initial segfault. The relevant test for this is:
test/hotspot/jtreg/runtime/ReservedStack
More can be read about the reasoning for this approach here:
https://openjdk.java.net/jeps/270
I tried writing a third approach that consisted of two parts that tried to workaround the kernel's MAP_STACK behavior:
pthread_attr_setguardsize
size to match the security.bsd.stack_guard_page
size so that the kernel set guard pages are not in the usable stack that pthread_attr_getstack()
returns.mprotect()
work.This works but, there's a massive TOCTOU issue that I realized as I was writing detailed comments for the approach. libthr caches thread stacks, so part 1 where checking the security.bsd.stack_guard_page
size is used to avoid the kernel's guard pages is not feasible. Take for example, the jvm starts and security.bsd.stack_guard_page
is set to 16, some threads are allocated and cached. Then later security.bsd.stack_guard_page
is reduced. If those cached stacks are returned back to the jvm, the current value of security.bsd.stack_guard_page
will be less then when it was allocated which would make the solution break (can't mprotect the kernel guard pages, cant read from them either).
I think we really need FreeBSD kernel devs to look into a solution for the issues created by the automatic placement of stack guard pages. The test program:
pthread_attr_setguardsize_test.c
should provide context into how the current approach is problematic for both libthr guard pages and the jvm.
For reference here's the third workaround (missing the sysctl querying and fixed at 4096):
diff --git a/src/hotspot/os/bsd/os_bsd.cpp b/src/hotspot/os/bsd/os_bsd.cpp
index 87e8b7a20d..4eda3e83c8 100644
--- a/src/hotspot/os/bsd/os_bsd.cpp
+++ b/src/hotspot/os/bsd/os_bsd.cpp
@@ -796,6 +796,23 @@ bool os::create_thread(Thread* thread, ThreadType thr_type,
int status = pthread_attr_setstacksize(&attr, stack_size);
assert_status(status == 0, status, "pthread_attr_setstacksize");
+#ifdef __FreeBSD__
+ // XXX Part 1 of work-around to get os::guard_memory() call to succeed.
+ // Set guard size to match sysctl value so that libthr places its guard
+ // pages on top of kernel placed guard pages. The result is that pthreads
+ // will return back to us a stack region that does not include the kernel
+ // guard pages. However, this is racy (TOCTOU). A well timed change to the
+ // sysctl after we check it but before libthr allocate the stack will result
+ // in guard pages in the usable stack range and ... oh no this wont work.
+ // libthr caches the thread stacks, so we don't have a way to know how many
+ // pages to skip. e.g. TOCTOU window could be days for a long running
+ // java app like tomcat.
+ status = pthread_attr_setguardsize(&attr, 4096);
+#else
+ status = pthread_attr_setguardsize(&attr, 0);
+#endif
+ assert_status(status == 0, status, "pthread_attr_setguardsize");
+
ThreadState state;
{
@@ -2238,10 +2255,6 @@ bool os::pd_uncommit_memory(char* addr, size_t size) {
return res != (uintptr_t) MAP_FAILED;
}
-bool os::pd_create_stack_guard_pages(char* addr, size_t size) {
- return os::commit_memory(addr, size, !ExecMem);
-}
-
// If this is a growable mapping, remove the guard pages entirely by
// munmap()ping them. If not, just call uncommit_memory().
bool os::remove_stack_guard_pages(char* addr, size_t size) {
@@ -2317,6 +2330,27 @@ bool os::protect_memory(char* addr, size_t bytes, ProtType prot,
return bsd_mprotect(addr, bytes, p);
}
+bool os::pd_create_stack_guard_pages(char* addr, size_t size) __attribute__ ((optnone)) {
+#ifdef __FreeBSD__
+ // XXX Part 2 of work-around to get os::guard_memory() call to succeed.
+ // This first call to bsd_mprotect is not effective until the pages are
+ // touched. It is needed because the ContinueInNewThread0 thread ends
+ // up placing the guard pages twice and the second time it is called the pages
+ // are PROT_NONE.
+ bsd_mprotect(addr, size, PROT_READ|PROT_WRITE);
+
+ // XXX Danger, Will Robinson! Read from the pages so that they get
+ // paged in and the later call to os::guard_memory() will succeed.
+ // If we hit a kernel placed guard page this will crash the jvm.
+ for (char *pos = addr; pos < addr + size; pos += Bsd::page_size()) {
+ char c = *pos;
+ }
+ return true;
+#else
+ return os::commit_memory(addr, size, !ExecMem);
+#endif
+}
+
bool os::guard_memory(char* addr, size_t size) {
return bsd_mprotect(addr, size, PROT_NONE);
}
diff --git a/src/java.base/unix/native/libjli/java_md_solinux.c b/src/java.base/unix/native/libjli/java_md_solinux.c
index 2d675062a4..c350a50b1b 100644
--- a/src/java.base/unix/native/libjli/java_md_solinux.c
+++ b/src/java.base/unix/native/libjli/java_md_solinux.c
@@ -766,7 +766,11 @@ ContinueInNewThread0(int (JNICALL *continuation)(void *), jlong stack_size, void
if (stack_size > 0) {
pthread_attr_setstacksize(&attr, stack_size);
}
+#ifdef __FreeBSD__
+ pthread_attr_setguardsize(&attr, 4096);
+#else
pthread_attr_setguardsize(&attr, 0); // no pthread guard page on java threads
+#endif
if (pthread_create(&tid, &attr, (void *(*)(void*))continuation, (void*)args) == 0) {
void * tmp;
edit: add missing portion of third workaround diff.
Thanks Kurt. I really appreciate the in depth analysis here. The JEP reference was also very helpful in explaining the significance of the reserved area in the JVM stack and what it is used for.
I have been a bit bogged down with merges today, but tomorrow I'll create a PR and reference this thread. I'll also try and ping some kernel developers directly. I just need to sit down and read this through again to make sure I'm clear. The analysis and test program is a big help though on that front.
Your welcome. Sounds good.
Another example of how security.bsd.stack_guard_page
causes problems for pthreads would be creating a thread with pthread_attr_setstacksize(&attr, PTHREAD_STACK_MIN);
and setting security.bsd.stack_guard_page
to 2 or higher. In this case the pthread_create() call fails.
I believe that setting security.bsd.stack_guard_page to 1 by default was done in response to
https://blog.qualys.com/securitylabs/2017/06/19/the-stack-clash
Here is the commit
https://svnweb.freebsd.org/base?view=revision&revision=320317
Created https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=239894 to start a discussion about this. I may also ping emaste and/or kib directly
Thank you Greg,
Updated test programs:
1) Shows pthread_attr_setguardsize(3)
is not currently working unless the size matches the security.bsd.stack_guard_page
size. Try setting security.bsd.stack_guard_page
to 1, 4, 5 and run program.
pthread_attr_setguardsize_test.c
Note: This is an updated version of the test previously posted here that removes an extraneous comment and adds __attribute__ ((optnone))
to check_stack function so the program works as intended regardless of opt level it was compiled at.
2) Shows pthread_create(3)
fails if security.bsd.stack_guard_page
size is greater then pthread_attr_setstacksize(3)
. Try setting Try setting security.bsd.stack_guard_page
to 3 or higher and run program.
I'm going to resolve this issue. I think we've done the best we can from a Java perspective on handling this issue. We have changes in upstream FreeBSD and corresponding changes in the Java code that will prevent the problem in future releases, we have a known sysctl workaround for existing releases that allow them to behave correctly, and a workaround in the Java code that is a best effort for those who choose not to use the sysctl workaround.
This program:
causes a VM crash on FreeBSD. On Linux, this results in a thrown StackOverflowError instead and a graceful exit. The same is true on OpenBSD, so this seems FreeBSD specific.
The crash on FreeBSD is due to a segfault:
I expect this to cause compliance test failures, although I haven't confirmed this yet. I've experimented with altering the stack size using -Xss, due to some earlier bug reports, but that doesn't change the behaviour.
My test environment is FreeBSD 11.3/amd64. I haven't tried other versions or architectures yet.