awnumar / memguard

Secure software enclave for storage of sensitive information in memory.
Apache License 2.0
2.49k stars 124 forks source link

What's possible reason .Bytes() returns an empty slice on LockedBuffer from enclave.Open() ? #138

Closed ns-gzhang closed 1 year ago

ns-gzhang commented 3 years ago

We encountered an infrequent problem at runtime in one of Ubuntu docker containers:

panic: runtime error: slice bounds out of range [:32] with capacity 0
goroutine 147009 [running]:
github.com/xxx/secretsclient.EncryptMapData(0xc03ca81c80, 0x4, 0xc0209ba1b0, 0xc0004d5200, 0x45, 0x5e, 0x0, 0x0)
    /go/pkg/mod/github.com/xxx/secretsclient@v0.1.3/encrypt.go:125 +0x1fae
...

The code segment around encrypt.go:125 is (with an added line # and comments)

(119)   keyIVBuf, err := encKeyIV.Open()  // encKeyIV was from keyiv.Seal() with keyiv.Size() == KeyIVSIZE(48)
    if err != nil {
        return err
    }
    defer keyIVBuf.Destroy()

(125)   block, err := aes.NewCipher(keyIVBuf.Bytes()[:cKeySize])  // cKeySize = 32
    if err != nil {
        return err
    }

Couldn't figure out what went wrong... would a low memlock ulimit possibly cause this? Thanks. (we do have ulimit setting

/ # cat /etc/security/limits.conf
* soft memlock 512000
* hard memlock 1024000

Somehow the ulimit command still shows the default (this would encounter problems quickly for us):

/ # ulimit -l
64

so a little puzzled here too.) Appreciate your help!!

Additional info: version github.com/awnumar/memguard v0.22.2 OS: Ubuntu 16.04 docker image Go 1.13

We are also seeing the following error now in one of the docker instances (same instance as above):

panic: runtime error: index out of range [0] with length 0
goroutine 6911 [running]:
github.com/awnumar/memguard/core.(*Coffer).View(0xc00001ec90, 0x0, 0x0, 0x0)
    /go/pkg/mod/github.com/awnumar/memguard@v0.22.2/core/coffer.go:112 +0x207
github.com/awnumar/memguard/core.Open(0xc00979aa20, 0x661f81, 0xc00000ab40, 0x0)
    /go/pkg/mod/github.com/awnumar/memguard@v0.22.2/core/enclave.go:101 +0x60
github.com/awnumar/memguard.(*Enclave).Open(0xc009482150, 0xc034b6d278, 0x4, 0xc009482150)
    /go/pkg/mod/github.com/awnumar/memguard@v0.22.2/enclave.go:43 +0x32
github.com/xxx/secretsclient.EncryptMapData(0xc034b6d278, 0x4, 0xc02fd7cea0, 0xc00016d6c0, 0x16, 0x1c, 0x0, 0x0)
    /go/pkg/mod/github.com/xxx/secretsclient@v0.1.3/encrypt.go:119 +0x9d
...
awnumar commented 3 years ago

This is related to #136

I have a fix in mind but it requires a breaking change and I haven't had enough time to complete it.

Don't know the exact cause but I think it's to do with having a global key value, because initialising it is difficult. We need to recover from panics etc that triggered memory wipes.

I'm planning on creating a new Bubble type that has its own enclave key, and have callers create and use methods on this.

ns-gzhang commented 3 years ago

Thanks Awn. From a user's point of view, if calling initialization of a global key from the app helps, we could easily do. Do you mean subdivide the set of enclaves into multiple Bubbles? And a Bubble will cover a set of enclave instances with a key..., that will reduce the concurrent operations on a key...? Thanks.

awnumar commented 3 years ago

Yeah that's exactly what I mean. A a bubble has its own set of containers protected by its key, and destroying it would destroy its contained information.

I'm going to try a quick fix later today, but the newer API will follow at some point. Hopefully this fixes the issue.

awnumar commented 3 years ago

Of course there would still have to be a global list of bubbles, but this is fine -- we already keep a global list of buffers. This can be moved into the bubbles.

ns-gzhang commented 3 years ago

Hi Awn, just wondering if you would make Bubble's internal so we don't have to change code. Thinking of Bubble's just like buckets in a hash table...

awnumar commented 3 years ago

I'm not sure if that would be a good solution, or fix the problem.

I made some changes that resulted in a deadlock on the deadlock example instead of a crash. This is a previous problem that happened #132, and that the fix (#133, #135) for that issue caused the bug described by this issue and #136.

I'll push my branch later and maybe you can take a look at it.

ns-gzhang commented 3 years ago

Hi Awn, additional info: after we increased the memlock limit successfully in docker containers, the problem seemed to go away. Seems there is some association between low memlock limit and this problem. Thanks.

awnumar commented 3 years ago

Hey, that's very interesting. Thanks for telling me.

I've been implementing an alternate container for the Coffer based on channels, but I don't have much time at the moment so it's taking a while. Will post updates to the patch branch.

hunjixin commented 2 years ago

any update for this issue, i met the same problems.

ns-gzhang commented 2 years ago

Increasing ulimit for memlock has been a robust solution for us. When we deploy with pods on K8s, need to increase the underlying server's ulimit to make pods ulimit larger/unlimited.

hunjixin commented 2 years ago

Increasing ulimit for memlock has been a robust solution for us. When we deploy with pods on K8s, need to increase the underlying server's ulimit to make pods ulimit larger/unlimited.

this operation resolve panic: runtime error: index out of range [0] with length 0 issue?

awnumar commented 1 year ago

this operation resolve panic: runtime error: index out of range [0] with length 0 issue?

Testing locally, properly setting the memlock limit does fix the panics