GPUPeople / Ouroboros

GPU MemoryManager based on virtualized queues
MIT License
19 stars 4 forks source link

Example #12

Open LAhmos opened 4 years ago

LAhmos commented 4 years ago

Do you have a simple example of how to use your allocator?

in your example, you have this what would this function do? allocation_size_byte = Ouro::alignment(allocation_size_byte, sizeof(int));

delanyinspiron6400 commented 4 years ago

This line just aligns the requested size to a multiple of an integer, since it is just a basic testcase that allocates memory and then writes to it and reads the memory back, hence we want that memory be able to hold full integers. So you don't have to do that, that is just for this testcase.

Here would be a simple example, very similar to what the testcase does, just with the testcase logic stripped out.

#include "device/Ouroboros_impl.cuh"
#include "device/MemoryInitialization.cuh"
#include "InstanceDefinitions.cuh"
#include "Utility.h"

// That is the fastest memory manager you can use, otherwise you can also choose OuroVAPQ
using MemoryManagerType = OuroPQ;

__global__ void d_testAllocation(MemoryManagerType* mm)
{
    int allocation_size = 16; // Allocation size is in Bytes
    void* memory = mm->malloc(allocation_size); // Allocate some Bytes from the memory manager
    mm->free(memory); // Return the memory back to the memory manager
}

int main()
{
    size_t instantitation_size = 8192ULL * 1024ULL * 1024ULL; // How much memory do you want to give the memory manager to manage?
    MemoryManagerType memory_manager; // Instantiate the memory manager
    memory_manager.initialize(instantitation_size); // Initialize it with a certain size
    d_testAllocation<<<1,1>>>(memory_manager.getDeviceMemoryManager()); // Pass it to a kernel
    return 0;
}

If you want to modify some parameters, there is the include/Parameters.h where you can change the number of queues (this changes how many different allocations are handled via Ouroboros (this is influenced by the lowest page size and the number of queues, standard is 16B and 10 Queues -> 16B - 8192B are handled by Ouroboros, the rest is forwared to the CUDA allocator). If you have further questions, just ask 👍 :)

LAhmos commented 4 years ago

Thanks a lot :+1: