bheisler / RustaCUDA

Rusty wrapper for the CUDA Driver API
Apache License 2.0
758 stars 60 forks source link

Fix memory over-allocation bugs #38

Closed LutzCle closed 5 years ago

LutzCle commented 5 years ago

DeviceBuffer, LockedBuffer, and UnifiedBuffer allocate several times more memory than requested. For example:

DeviceBuffer::zeroed::<i64>(2_usize.pow(30))

allocates 8 GB instead of the requested 1 GB. This can result in an unexpected OutOfMemory error.

The bugs occur because the respective *_malloc() wrappers already account for the scaling from "count of T" to bytes. Thus, count is multiplied with size_of::<T>() twice.

This fix simply removes the superfluous multiplications.

LutzCle commented 5 years ago

How would you devise a test case to catch these types of bugs?

bheisler commented 5 years ago

Huh, you're right. Well, that's kind of embarrassing. Thanks for the pull request (and thanks for noticing this mistake).

The only way I can think of to check if we're allocating too much memory is to deliberately write to a memory location that should be out of bounds and make sure that that fails, but I'm not really sure that will work.

This change will introduce a slight change in behavior that will need to be fixed, however. In the current code, DeviceBuffer and related types will only allocate if their backing array is more than zero bytes. In your change, they'll allocate if the backing array is more than zero elements. It might not seem like there's a difference, but zero-sized types are a thing; you could (in principle) create a DeviceBuffer::zeroed::<()>(100), and because there are 100 elements this would incorrectly attempt to allocate a zero-length buffer.

saona-raimundo commented 5 years ago

Hi there!

I wanted to ask what would be needed to fix this issue.

Maybe, simply change the last bytes for size, say, in DeviceBuffer:

pub unsafe fn uninitialized(size: usize) -> CudaResult<Self> {

    let bytes = size
        .checked_mul(mem::size_of::<T>())
        .ok_or(CudaError::InvalidMemoryAllocation)?;

    let ptr = if bytes > 0 {
        cuda_malloc(size)?
}

... but then there would be two multiplications, right? 1) Defining bytes 2) Allocating in cuda_malloc

LutzCle commented 5 years ago

Ok, so zero-sized types are now handled correctly. There is no need to calculate bytes twice, it's enough to check if the type's size is greater than 0.

In addition, I've added tests to check for out-of-memory errors in DeviceBuffer and UnifiedBuffer.

However, LockedBuffer only has a placeholder test. My testing method would have to query the available memory in the system, but I'm not aware of a platform-independent way to do that. I'm open to suggestions here.

bheisler commented 5 years ago

Looks good, thanks for the pull request!