Closed SimeonEhrig closed 3 years ago
Do I need to create a new repository?
Hi, I dont understand what this part do
partial_update[4][4]
# distribute the data over 4 GPUs
for i in range(4):
# size [4][10]
gpu_image_memory[i] = alloc_gpu_memory(number_of_images/4)
# size [4][4]
gpu_partial_update_memory[i] = alloc_gpu_memory(4)
Do I need to create a new repository?
No, a new folder is enough.
Hi, I dont understand what this part do
partial_update[4][4] # distribute the data over 4 GPUs for i in range(4): # size [4][10] gpu_image_memory[i] = alloc_gpu_memory(number_of_images/4) # size [4][4] gpu_partial_update_memory[i] = alloc_gpu_memory(4)
This part allocates memory for the GPUs 4 times. Each time for another GPU. This means, in the first iteration it allocates memory with the size of number_of_images/4
and 4
for the first GPU, in the second iteration the same size for the second GPU and so one.
I still don't understand. What is the difference between this 3 part?
# allocate memory for the GPU, size [4]
gpu_update_memory = alloc_gpu_memory(4)
# size [4][10]
gpu_image_memory[i] = alloc_gpu_memory(number_of_images/4)
# size [4][4]
gpu_partial_update_memory[i] = alloc_gpu_memory(4)
I still don't understand. What is the difference between this 3 part?
# allocate memory for the GPU, size [4] gpu_update_memory = alloc_gpu_memory(4) # size [4][10] gpu_image_memory[i] = alloc_gpu_memory(number_of_images/4) # size [4][4] gpu_partial_update_memory[i] = alloc_gpu_memory(4)
That's three different variables of memory. In CUDA C++, it would be something like
float gpu_image_memory[4]*;
float gpu_partial_update_memory[4]*;
for(int i = 0; i < 4; ++i){
cudaMalloc(gpu_image_memory[i], (number_of_images/4)*sizeof(float));
cudaMalloc(gpu_partial_update_memory[i], 4*sizeof(float));
}
and gpu_update_memory
is a little bit special, because if we would be consisted, we have to write gpu_update_memory[4][1]
. So it is simply a scalar value for each GPU, so we can skip the extra dimension.
I am not sure cupy is capable of that. I will look into this and other solutions
That's possible. Please have a look, if we need to change the behavior. It is also possible to allocate and single bunch of memory and work with offsets:
float gpu_image_memory*;
cudaMalloc(gpu_image_memory, 4*(number_of_images/4)*sizeof(float));
//...
for(int i = 0; i < 4; ++i){
// arguments: start address, end adress
do_stuff(gpu_image_memory + (i * (number_of_images/4)),
gpu_image_memory + ((i+1) * (number_of_images/4)))
}
Done in #22
Short description
Implement a GPU memory, which allows to allocate and copy memory from and to the GPU via Python API and allows to pass the GPU memory as argument to a C++ CUDA code via pybind11 Python binding.
Task
You should implement the marked functions in the following code skeleton. This example is a mixture of real code and pseudo code and not totally correct. The application starts with
main.py
. I omit the pybind11 boiler plate code.main.py
I suggest trying cupy first. However, feel free to use a different library, implement your own Python bindings, or change the function APIs. Only the following properties must be met: