crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.37k stars 1.62k forks source link

Fix using `System.retry_with_buffer` with stack buffer #14615

Closed straight-shoota closed 4 months ago

straight-shoota commented 4 months ago

retry_with_buffer uses a stack allocated buffer for the happy case of small allocations. But this means the buffer is only valid inside the block. Outside of it, the stack memory may be corrupted because it can get overwritten by other data.

All call sites of retry_with_buffer were not paying attention to this. In all cases the buffer is used as scratch space to put dynamically sized properties of a struct. This data is later used to construct a String instance. This needs to happen within the block of retry_with_buffer to ensure the buffer is valid.

This error hasn't been noticed until I tried a little refactor in #14614. I suppose the code usually still works fine because there are no further calls which could override the stack data. It only failed in the interpreter, which I guess does something differently. May this need further investigation?

BlobCodes commented 4 months ago

It only failed in the interpreter, which I guess does something differently. May this need further investigation?

Blocks are inlined in the normal backend, so the StaticArray is on the same stack frame and still valid.

In the interpreter, blocks are an own primitive which seem to behave like a call (variables are instantly discarded on return).


We can also see this with the following code:

def a(&)
  var = 0                      
  yield
  puts "a: #{pointerof(var)}"
end

def b
  var = 0
  puts "b: #{pointerof(var)}"
end

def c
  var = 0
  puts "c: #{pointerof(var)}"

  a {}
  a {}
  b
  b
end

c

When compiling this with LLVM, I get the following results (without optimization):

c: Pointer(Int32)@0x7ffe76f15b84
a: Pointer(Int32)@0x7ffe76f15b80
a: Pointer(Int32)@0x7ffe76f15b7c
b: Pointer(Int32)@0x7ffe76f15b64
b: Pointer(Int32)@0x7ffe76f15b64

But on the interpreter, I get the following:

c: Pointer(Int32)@0x7f68e1400000
a: Pointer(Int32)@0x7f68e1400008
a: Pointer(Int32)@0x7f68e1400008
b: Pointer(Int32)@0x7f68e1400008
b: Pointer(Int32)@0x7f68e1400008

Besides the fact that normal stacks grow down while the interpreter's stack grows up, we can see that the variables defined in all calls of c and b overlap in the interpreter, while in llvm only the normal calls overlap.

asterite commented 4 months ago

Right, blocks are implemented differently in the interpreter (they work more like how Ruby works)

beta-ziliani commented 4 months ago

So the real fix should be in the interpreter, right?

BlobCodes commented 4 months ago

So the real fix should be in the interpreter, right?

I think it is reasonable to assume that variables defined in a yielding method may not be available anymore once we're not inside that block anymore (undefined behaviour).

Otherwise, we'd also have to manually inline everything annotated with @[AlwaysInline] in the interpreter. I think the added conplexity just isn't worth it for this small corner case.

straight-shoota commented 4 months ago

@beta-ziliani This is the real fix. At least for this bug. buf must not be expected to be valid outside the block scope.

It's just incidental that the interpreter behaviour helped us notice this bug. It stays a bug even if the behaviour of the interpreter changed. Whether we should align block execution in the interpreter to a behaviour more similar to compiled code is a different discussion, though.

beta-ziliani commented 4 months ago

Yes, I see that now; I misunderstand the problem. I agree with @BlobCodes that this falls within the undefined behavior territory, so alignment between the compiler and interpreter is not a goal.