Closed ccamel closed 4 months ago
Below is my analysis and steps to reproduce the critical issue at hand. Let's open discussion about this issue and potential solutions to resolve it.
length/2
allocation issueHere the simplified prolog query that cause hight memory consumption :
length(List, 10000000000), maplist(=(1), List).
Initially, I thought the issue was due to the recursive call of execution of all clauses through maplist/2
. However, the recursion is not the problem since the logic module asks for prolog execution through a given context that handles the out of gas limit. The context is given to the Promise
that handles each predicate and stops immediately when the gas limit is reached.
When calling length(List, 4)
, prolog will allocate a new Variable four times and unify it into the List :
func lengthRundown(vm *VM, list Variable, n Integer, k Cont, env *Env) *Promise {
elems, err := makeSlice(int(n))
if err != nil {
return Error(resourceError(resourceMemory, env))
}
for i := range elems {
elems[i] = NewVariable()
}
return Unify(vm, list, List(elems...), k, env)
}
The only limitation checked here is the memory limit depending on the node host caught by makeSlice()
. Otherwise, for very large numbers, it takes time to allocate NewVariable()
through the loop range and is not handled by the context limitation. This means, when we call this predicate through a context with a deadline :
func TestInterpreter_performance(t *testing.T) {
var out bytes.Buffer
p := New(nil, &out)
tests := []struct {
name string
query string
output string
outputFn func(t *testing.T, output string)
err error
waits bool
}{
{name: "1", query: `length(List, 10000000), maplist(=(1), List).`, err: context.DeadlineExceeded, waits: true},
{name: "2", query: `length(List, 10000000000), maplist(=(1), List).`, err: context.DeadlineExceeded, waits: true},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
ctx, cancel := context.WithTimeout(context.Background(), 10000*time.Millisecond)
defer cancel()
assert.Equal(t, tt.err, p.QuerySolutionContext(ctx, tt.query).Err())
assert.Equal(t, tt.waits, errors.Is(ctx.Err(), context.DeadlineExceeded))
})
}
}
The first test case works since the deadline is handled by maplist
, but the second one fails due to the test timeout and not handled by the context cancellation.
I made a small workaround for testing purposes by adding the context to the VM and using it to check if the context is not done before continuing :
func lengthRundown(vm *VM, list Variable, n Integer, k Cont, env *Env) *Promise {
elems, err := makeSlice(int(n))
if err != nil {
return Error(resourceError(resourceMemory, env))
}
for i := range elems {
select {
case <-vm.Ctx.Done():
fmt.Println("TIME OUT")
return Error(fmt.Errorf("time out"))
default:
elems[i] = NewVariable()
}
}
return Unify(vm, list, List(elems...), k, env)
}
Here is the snippet of code that is used for making a new slice by controlling available memory :
The calculation of the memory limit for making a new slice is not safe and will depend essentially on the host where the node runs. It's not safe because between the time when the available memory is checked (free := memFree()
) and when the slice is allocated (make([]Term, n)
), another part of the program could potentially allocate memory, causing the actual memory usage to exceed the limit. (In a blockchain node execution condition, this occurs multiple times on my side during my test). By the way, we cannot catch memory overflows : https://github.com/ichiban/prolog/issues/226.
Let's discuss possible fixes we can implement π . @ccamel @amimart
Thanks @bdeneux for your detailed analysis of the nature of the security problem revealed by the audit.
Regarding the length/2
allocation issue, we can't use the deadline pattern to solve it because it introduces unpredictability into the execution of requests. Depending on the complexity of the query and the current load on the system, the same query may sometimes succeed and sometimes fail, compromising the ability of the validation nodes to reach consensus.
In preliminary consideration, I believe the best approach would be to implement controls on the allocators (memory, variables, etc.) that:
That said, it's not easy, quite tricky (as you mentioned, https://github.com/ichiban/prolog/issues/226 is a good read) and you have to apply it to all the code, mainly in the ichiban/prolog library, but also to our native predicates, if necessary.
But I'd need a bit more time to get a better grasp of the issue and see if other approaches might not be more effective and safer.
@ccamel @bdeneux This is really tricky, I would propose a first far from ideal fix: prohibit unification of the length with a not initialized variable using the length/2
predicate, same approach could be use for the functor/3
predicate. This way we can still get the length of an array, but not initializing one..
What do you think?
@amimart Yes seems a good temporary solution but after working on https://github.com/axone-protocol/prolog/pull/3, I think it would be better (as a temporary solution too) if we add instead a maximum number of NewVariable possible. It will allow alway to use unification of length with uninitialized variable but with a defined limit, like say @ccamel in his first comment about allocated resources. This is a temporary solution since it act only on the variable resource and not on the global memory but it's a good starting point.
@amimart Yes seems a good temporary solution but after working on axone-protocol/prolog#3, I think it would be better (as a temporary solution too) if we add instead a maximum number of NewVariable possible. It will allow alway to use unification of length with uninitialized variable but with a defined limit, like say @ccamel in his first comment about allocated resources. This is a temporary solution since it act only on the variable resource and not on the global memory but it's a good starting point.
Totally agree with you that's way better π, let's go this way
Description
While approaching the prolog code being executed on the okp4 blockchain, we used blackbox approach in which we tried various prolog predicates which were natively allowed in the blockchain. We found that if a large list is generated and then tried to be mapped, it would cause a high memory usage which may lead to termination of that specific blockchain instance. This issue was also verfied on itchiban/prolog's Scripting engine. If a large list i.e
N = 10000000000
was passed it would cause high memory usage that the node instance is forced to terminate. It is interesting to note that on some value ofN = 10000
, the blockchain will gracefully handle the issue and return gas error.Furthermore, it was tested that this issue is applicable for all predicates that may have a high memory consumption.
This finding is not isolated to our custom testing environments but was also corroborated using the Ichiban Prolog's scripting engine, highlighting a fundamental issue within the underlying computational logic that poses a serious risk to the blockchain's stability and operational integrity.
Impact
The ramifications of this vulnerability are twofold:
Recommandation
TBC