Closed wodadehencou closed 1 year ago
Thanks for reporting. 👍
After applying the patch the behaviour seems identical. A bunch of timeouts after a period:
...
1587048652522845416, 36642766
1587048652555571711, 3972773
1587048652544642401, 14974528
1587048652555173078, 4526586
^C### timeout
1587048652559703689, 27796593682
### timeout
1587048652559624289, 27796714889
### timeout
1587048652559552655, 27796801999
...
I can not get the problem on my platform (go1.14, OSX), could you @awnumar please add two lines codes at the end of function OpenEnclave
, print out the stack information.
// ...
<-ctx.Done()
time.Sleep(time.Second)
buf := make([]byte, 1<<20)
fmt.Println(string(buf[:runtime.Stack(buf, true)])
}
I read all the codes of memguard, there are some other place have the same bug, I will make a MR as soon as possible
This is because the Purge()
function, operation flow is
enclave.Open -> Key.View --(get RLock)--> NewBuffer -> memcall.Alloc[Fail] -> Panic -> Purge -> key.Lock (request write lock)
This is another dead lock case. But I don't know why in your @awnumar system, memcall.Alloc was fail
A simple solution for all these dead lock cases, is to arrange all the Lock() & UnLock() carefully. But it will be less efficient, there will be more than ONE lock in a few of continuous operations.
Thanks for looking into this and providing a patch. I have merged your PR and added you to the authors file.
I can no longer reproduce the deadlock on my system. Your PoC does however produce some panics that are very interesting. I have added it to the examples submodule to investigate when I have more time.
Thanks again, and keep safe :)
Describe the bug Open enclave may cause RWMutex lock, caused by recursive rlock the RWMutex which is not permitted by golang.
in file: src/sync/rwmutex.go
To Reproduce Steps to reproduce the behaviour:
func (s *Coffer) Destroyed() bool {
Expected behaviour all process run normally
Screenshots If applicable, add screenshots to help explain your problem.
System (please complete the following information):
Additional context Use testcase below can generate the case. CPU usage start at about 100%, then decrease to 0%. now the process is locked.