KomputeProject / kompute

General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases. Backed by the Linux Foundation.
http://kompute.cc/
Apache License 2.0
1.94k stars 146 forks source link

eStorage tensor fix, eval clears operations in sequence, added one test and modified tests that were broken #304

Closed MiroPalmu closed 1 year ago

MiroPalmu commented 1 year ago

This might might be too large PR. Sorry about that. Because of my mistake PR #302 is also included in this.

Two main things are that we can now create eStorage tensors with nullptr as data without undefined behaviour and evalAwait (and functions calling it) now clear operations stored in sequence.

  1. eStorage changes

Added a concept "device only" tensors that can be checked with isDeviceOnlyTensor(). At the moment eStorage is only device only tensor.

This is used to determine if we need to do raw data mappings and unmappings. Also in postEval of OpTensorCopy made it such that it does not copy raw data if the tensor that we copy from is device only nor it does not copy raw data to a tensor that is device only.

This fixes undefined behaviour that we had with eStorage tensor. Now we can create device only tensors with manager by setting data to nullptr.


// Create device only tensor with 10 floats
auto tensor = mgr.tensor(nullptr, 10, sizeof(float), kp::Tensor::TensorDataTypes::eFloat, kp::Tensor::TensorTypes::eStorage);

I added one test about copying in and out from eStorage tensors.

  1. eval changes

Motivation behind changes: There was lot of tests that were chaining evals together. For example


    mgr.sequence()
      ->eval<kp::OpTensorSyncDevice>({ tensorA })
      ->eval<kp::OpTensorCopy>({ tensorA, tensorB });

This is straight up broken because we never clear vk::CommandBuffer nor operations stored in the sequence. When second eval is called it runs OpTensorSyncDevice again because it never got cleared form the sequence. This can be checked from the logs.

Because I think chaining evals is quite intuitive api and I like it I wanted to fix this so what I propose is that evalAwait would always clear the operations stored in the sequence. So now you have to manually record operations again but I don't think this is a problem.

I added vk::CommandBufferUsageFlagBits::eOneTimeSubmit to command buffer create info such that we are sure vk::CommandBuffer gets cleared every time we start record to it.

With this change some tests got broken or they had bugs in them. I also fixed them.

MiroPalmu commented 1 year ago

BTW I did not figure out hot to generate doctrings.hpp automatically so I modified it manually.

axsaucedo commented 1 year ago

Closing as superceeded by https://github.com/KomputeProject/kompute/pull/316