Open yuval-k opened 8 years ago
thanks for this bug report - I assume this is for the 1.5 version?
it might take me a bit to look-at/integrate the fix -- I'll try to block some time but if you or anyone else watching the thread wants to get it in sooner we'd want the following:
@eyberg I don't think you should merge my commit - The real fix i believe is to use the rump system stack for rump sys calls; Unfortunately I don't have enough go\rump knowledge to do that
I am experiencing a stack overflow when with gorump. I believe that the core cause is that the stack is not change when performing a system call, but i am by no means go expert.
I'll describe, when running our unik example_go_static_fileserver on aws (or xen for that matter) we get the following error:
I'll save you the long days of single stepping due to the lack of watchpoint support from xen gdb stub and get to the root cause.
We have two go routines that do read\write. These go routines are created right after the other and have 2kb of stack space allocation adjacent to each other.
while the first goroutine is waiting for IO, the second one runs. it calls on the write syscall. [at this point from my understanding the stack should change to the system stack, as syscalls are not aware of any go stack business. This does not happen]
The functions that are related to the syscall (specifically, write to the xen console) run on the same 2kb stack. Unaware of go's stack struction the second goroutine that is now running C code, runs out of stack space and overwrites the first goroutine's stack. The bug is only detect when the first goroutine resumes and crashes.
Note the RAX in the dump: 0000000404216120 0x4216120 is the original value (the value is a pointer and the variable is stored on the stack) 0x00000004 was written by the second go routine, during the stack overflow.
To test, i doubled go's stack size, and everything seems to work. this is the fastest solution i can think of, but it is also the hackiest.