KomputeProject / kompute

General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases. Backed by the Linux Foundation.
http://kompute.cc/
Apache License 2.0
1.98k stars 153 forks source link

Cannot create tensor without initialization data #295

Open cracy3m opened 2 years ago

cracy3m commented 2 years ago

Is there a way to create a Tensor without a initialization data? The data arg of kp::Tensor () function set to nullptr will occur exception (alway use memcpy(..., data, ...) inside the function)!

kp::Tensor ( std::shared_ptr< vk::PhysicalDevice > physicalDevice, std::shared_ptr< vk::Device > device, void *data, uint32_t elementTotalCount, uint32_t elementMemorySize, const TensorDataTypes &dataType, const TensorTypes &tensorType=TensorTypes::eDevice )

axsaucedo commented 2 years ago

@cracy3m thank you for reporting, the current expected behavious is actually to avoid tensors being created without data, you can look at this test through TestOpTensorCreate.ExceptionOnZeroSizeTensor, mainly as it actually creates the underlying Vulkan resources when these get initialised. Is there a reason why you want to create the tensor without any data?

cracy3m commented 2 years ago
  1. Thanks to reply. So I have to create Tensor object with data.
  2. I just want to wrap kompute as a function called by others to do some calculations, so I don't need or know the input data. I think it is troublesome for each Tensor to create zero value data for initialization.

    In addition, why not configure whether the validation layer of Vulkan instance is enable through variables or functions? I want to build a dynamic link library, so that users can control whether the function is turned on even in the release version

axsaucedo commented 2 years ago

I just want to wrap kompute as a function called by others to do some calculations, so I don't need or know the input data. I think it is troublesome for each Tensor to create zero value data for initialization.

I'm not sure I understand this use-case, Tensors are currently supposed to be seen as containers of Vulkan GPU-memory data as opposed to containers of CPU-memory data, which means that it assumes you may want to have a separate way to handle your CPU-memory data. Can you provide more details on why this is not the case? It seems it may just be expected to be used in a way that it's not intended (at least at this point)

In addition, why not configure whether the validation layer of Vulkan instance is enable through variables or functions? I want to build a dynamic link library, so that users can control whether the function is turned on even in the release version

I'm not sure what you mean by this, as currently it's possible to build the debug variables with build parameters, irrespective of whether you build against a dynamic or static library. Can you clairfy what you mean and how the build parameters don't fullfill this?

cracy3m commented 2 years ago

I'm not sure I understand this use-case, Tensors are currently supposed to be seen as containers of Vulkan GPU-memory data as opposed to containers of CPU-memory data, which means that it assumes you may want to have a separate way to handle your CPU-memory data. Can you provide more details on why this is not the case? It seems it may just be expected to be used in a way that it's not intended (at least at this point)


Sorry, my English is not very good, so I used the translator. I just think it's a little troublesome to create a zero value buffer every time when create a kp:: Tensor object.


My use-case:

  1. I know the input data size ( For example, input 1000x512 uint16_t buffer to do WinFunc(like hanning) -> FFT -> ...)
  2. I do not know the input data value, because it is generate at runtime In this case, I must to create Tensors with zero value buffer, it's a little troublesome

I'm not sure what you mean by this, as currently it's possible to build the debug variables with build parameters, irrespective of whether you build against a dynamic or static library. Can you clairfy what you mean and how the build parameters don't fullfill this?

Will this statement be clearer :

  1. Build my code to a DLL(lib) file (only release versoin, use kompute source to some fixed calculations)
  2. Send the DLL(lib) file to other people (or environment) for use.
  3. If some some thing go wrong, I think it would help to enable vukan validation layer throught config file(xml, json ...)

It is also possible that my idea is incorrect, because if there is no error in Vulkan validation layer during development, there will be no error in other use environments. Or I must provide a debug version of the DLL These are not very important. I can also create Vulkan instances to realize such functions Thank you.

axsaucedo commented 2 years ago

It is also possible that my idea is incorrect, because if there is no error in Vulkan validation layer during development, there will be no error in other use environments. Or I must provide a debug version of the DLL These are not very important. I can also create Vulkan instances to realize such functions Thank you.

Hmm yeah it does sound like it shouldn't really matter whether you are linking dynamically or statically. The standard best practice is to only use validation layers during development, and then once you build for production you would remove these validation layers as otherwise you would be adding quite a lot of overhead.

My use-case:

Ok I see, I do have to take a deeper look at this, as currently the idea of the Tensors is to become a container for GPU data as opposed to CPU data. I will have to take a deeper look at whether indeed we would be looking at having this, as ultimately we are looking to take the opposite way, in which no data at all is stored in RAM and all data is stored in GPU memory.

The challenge of allowing for non-initalisation of Tensors is that this would allow for optional initialisation of vulkan components to a later stage, which adds overhead complexity, as opposed to now which makes it such that a tensor will not always have initialised vulkan components.

Given it would add quite a lot of complexity, I am inclined to explore more efficient ways in which this could be handled at the application level on your side unless there is a simpler way of supporting this, let me have a look to see if indeed it's a relatively simple and reasonable change.

axsaucedo commented 2 years ago

I just had a deeper look and it does seem like there's not a trivial way in which this would be achieved as there's no concept of "uninitialised components", as that was initially explored but as a design decision it was decided that kompute resources are created together with the GPU resources and destroyed once. Based on this we'll have to pass on this feature for the time being, but I will leave this issue open as we will be planning to do a large refactor improvement in the near term and can consider then whether this is a feature we would be looking at adding.

cracy3m commented 2 years ago

I just had a deeper look and it does seem like there's not a trivial way in which this would be achieved as there's no concept of "uninitialised components", as that was initially explored but as a design decision it was decided that kompute resources are created together with the GPU resources and destroyed once. Based on this we'll have to pass on this feature for the time being, but I will leave this issue open as we will be planning to do a large refactor improvement in the near term and can consider then whether this is a feature we would be looking at adding.

emm.., Maybe what I want is simple:

//V0.8.1
Tensor::rebuild(void* data,
                uint32_t elementTotalCount,
                uint32_t elementMemorySize)
{
    KP_LOG_DEBUG("Kompute Tensor rebuilding with size {}", elementTotalCount);

    this->mSize = elementTotalCount;
    this->mDataTypeMemorySize = elementMemorySize;

    if (this->mPrimaryBuffer || this->mPrimaryMemory) {
        KP_LOG_DEBUG(
          "Kompute Tensor destroying existing resources before rebuild");
        this->destroy();
    }

    this->allocateMemoryCreateGPUResources();
    this->mapRawData();

    memcpy(this->mRawData, data, this->memorySize()); // <-- this line
}

memcpy(this->mRawData, data, this->memorySize()); // change to : if (data) memcpy(this->mRawData, data, this->memorySize());

I also have some doubts about that, is it appropriate keep mapping staging buffer since Tensor::rebuild() until ~Tensor(). I think it may occupy cpu address resources, although my app will not create many tensors object.

axsaucedo commented 2 years ago

@cracy3m I did some initial tests to validate if the change would be as straightforward as this, however there would be quite a lot of things required, namely as currently the codebase has some checks such as throwing exceptions if the provided data type is larger than the initially confgiured one. The rebuild would also call destroy and hence wouldn't allow for this to be used outside of the manager. Overall this is why I mentioned above that it wouldn't be just a trivial change, and may be considered for a larger refactor down the line.

cracy3m commented 2 years ago

@cracy3m I did some initial tests to validate if the change would be as straightforward as this, however there would be quite a lot of things required, namely as currently the codebase has some checks such as throwing exceptions if the provided data type is larger than the initially confgiured one. The rebuild would also call destroy and hence wouldn't allow for this to be used outside of the manager. Overall this is why I mentioned above that it wouldn't be just a trivial change, and may be considered for a larger refactor down the line.

Thank you reply me again.

I got it. Only consider Tensortypes::eDevice type without initializing the staging buffer, which may cause other problems in using tensor class.

mirsadm commented 1 year ago

Adding to the discussion, it would be helpful to be able to pass a nullptr when setting up a tensor that is used for output. One use case is when working with real time applications where CPU/GPU memory is shared. It is not necessary to copy between CPU/GPU memory when the buffer is initialised. This is useful on Android in particular where the memory copy is quite often slower than the compute time.

wdx04 commented 8 months ago

It's very common for an algotithm to use output-only buffers/tensors. In SYCL we can declare a buffer accessor as write_only, thus eliminating unnecessary data movement from host to device. Also in SYCL the host memory used to construct a buffer can be automatically synchronized from device after computation is done. It's a surprise to me that Kompute can't do either of these things.