Memory limits via allocation sampling

SquidDev commented 1 year ago

One of Cobalt's weaker points is that it does not impose any limits on the amount of memory the VM can use. Ideally CC: Tweaked would switch over to a more native-style VM which does support this (see https://github.com/cc-tweaked/CC-Tweaked/issues/769), but I think that's a long way away.

Unfortunately, it is impractical to track every single allocation - this would make the implementation significantly more complex, and incur a massive overhead.

One alternative idea, inspired by this OCaml package (though perhaps obvious in retrospect) is to monitor a small sample of our allocations, and estimate actual memory usage based on those. To further simplify things, I propose we only track array allocations: memory usage will be higher than our estimate, but it should still be bounded by some constant factor (2-3x?).

Implementation

Add some Allocator class, which provides a newTypeArray(LuaState, int size) method for the various core types (byte, LuaValue, a generic T), as well as a corresponding resizeArray.
The LuaState is augmented with three fields:
- int allocationCounter: Tracks how many allocations are left before we take another sample.
- final long maxMemory: The maximum memory we can allocate.
- AtomicLong currentMemory: The current memory. Note this needs to be atomic as we'll decrement it from another thread.
When allocating an array, we compute the size of this array in bytes. If the size of the array is larger than a constant (16KiB?) or if decrementing the allocation counter would take it to < 0, then:
- If this allocation would take us above the maxMemory, then error.
- Otherwise, increment currentMemory and add this object to a queue of WeakReferences.
- Update allocationCounter to be a random number between 0 and 2 * our sampling rate (probably 1k). This provides a very basic form of abuse mitigation, by making which allocations are sampled non-deterministic.
This reference queue is polled on a separate thread (it can be shared across all Lua VMs). Each WeakReference stores its original size and a reference to the owner's currentMemory. When the weak reference is poled, we decrement its owner's memory.

Concerns

The main concern here is this is heavily tied to Java's GC. It's possible the Lua VM could no longer hold a reference to a large object, but the GC hasn't got to it yet, so the currentMemory is still large.

It might be safer to set the max memory to something arbitrarily high (1GiB?) and expose the memory usage via a metric. This way we can get a better idea of the current behaviour before doing anything more drastic.

SquidDev commented 1 year ago

An alternative approach would be to use JVMTI's allocation sampling to handle it for us. I think this is probably a bad idea (it requires native code, and even if we just enable it for CC's threads, I'm not sure what performance it has on the whole VM[^1]), but worth mentioning at least.

[^1]: The JEP mentions it's a 1% performance overhead with an empty handler, increasing to 3% for something which tracks each allocated object. Which isn't massive, but across the whole of Minecraft would be unacceptable!

SquidDev commented 1 year ago

One issue which has only just occurred to me, is that while allocation sampling is fine for monitoring "normal" computers, it's not safe against adversarial attacks.

The main issue is that not all allocations are tracked (which is intentional, as otherwise this would come with massive overhead!). This means that if you have an oracle to detect whether an allocation was tracked or not, you can retain references to non-tracked objects and drop references to tracked ones. This means your tracked memory remains constant, but actual memory continues to grow!

There are some obvious oracles (collectgarbage("count")) which we could block, but more problematically receiving a "out of memory" error is itself an indicator that the object was probably tracked. This means we could write a program like follows:

local tbl = {}

-- Fill tbl with 0s until we OOM
local i = 1
while pcall(function() tbl[i] = 0 end) do i = i + 1 end

-- Now fill tbl with untracked strings.
local i = 1
while true do
  local ok, res = pcall(string.rep, " ", 1024)
  if ok then
    -- Allocation wasn't tracked, keep a reference to it
    tbl[i] = i
    i = i + 1
  else
    -- Out of memory, try again
  end
end

I honestly don't know if there's a good solution to this :/.

SquidDev commented 1 year ago

I think the best solution for now is probably to not impose any memory limits at all (thus removing the oracle) and just provide monitoring tools on the CC:T side.

cc-tweaked / Cobalt

Memory limits via allocation sampling #66

Implementation

Concerns