Closed Mr-Thack closed 3 hours ago
├── CMakeLists.txt
├── shader
│ └── adding.comp
└── src
└── main.cpp
cmake_minimum_required(VERSION 3.20)
project(issue)
set(CMAKE_CXX_STANDARD 14)
# Options
option(KOMPUTE_OPT_GIT_TAG "The tag of the repo to use for the example" master)
option(KOMPUTE_OPT_FROM_SOURCE "Whether to build example from source or from git fetch repo" OFF)
if(KOMPUTE_OPT_FROM_SOURCE)
add_subdirectory(../../ ${CMAKE_CURRENT_BINARY_DIR}/kompute_build)
else()
include(FetchContent)
FetchContent_Declare(kompute GIT_REPOSITORY https://github.com/KomputeProject/kompute.git
GIT_TAG ${KOMPUTE_OPT_GIT_TAG})
FetchContent_MakeAvailable(kompute)
include_directories(${kompute_SOURCE_DIR}/src/include)
endif()
# Compiling shader
# To add more shaders simply copy the vulkan_compile_shader command and replace it with your new shader
vulkan_compile_shader(
INFILE shader/adding.comp
OUTFILE shader/adding.hpp
NAMESPACE "shader")
# Then add it to the library, so you can access it later in your code
add_library(shader INTERFACE "shader/adding.hpp")
target_include_directories(shader INTERFACE $<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}>)
# Setting up main example code
add_executable(issue src/main.cpp)
target_link_libraries(issue PRIVATE shader kompute::kompute)
#version 450
layout(std430, binding = 0) buffer Input {
vec3 input[];
};
layout(std430, binding = 1) buffer Output {
vec3 output[];
};
void main() {
uint index = gl_GlobalInvocationID.x;
// Just copy stuff from input to output
output[index] = input[index];
}
#include <iostream>
#include <memory>
#include <vector>
#include <kompute/Kompute.hpp>
#include <kompute/Tensor.hpp>
#include <shader/adding.hpp>
typedef std::shared_ptr<kp::TensorT<uint32_t>> UTens;
typedef std::vector<uint32_t> uvec;
void printTensor(std::string name, UTens tensor) {
std::cout << name;
std::cout << ": { ";
for (const float &elem : tensor->vector()) {
std::cout << elem << " ";
}
std::cout << "}" << std::endl;
}
int main() {
kp::Manager mgr;
UTens input = mgr.tensorT(uvec{2, 4, 6});
UTens output = mgr.tensorT(uvec{0, 0, 0});
const std::vector<std::shared_ptr<kp::Memory>> params = {input, output};
const std::vector<uint32_t> shader = std::vector<uint32_t>(
shader::ADDING_COMP_SPV.begin(), shader::ADDING_COMP_SPV.end());
std::shared_ptr<kp::Algorithm> algo = mgr.algorithm(params, shader);
mgr.sequence()
->record<kp::OpSyncDevice>(params)
->record<kp::OpAlgoDispatch>(algo)
->record<kp::OpSyncLocal>(params)
->eval();
printTensor("Input", input);
printTensor("Output", output);
}
And then run with:
mkdir build && cd build && cmake .. && cmake --build . && ./issue
If there is any typo, please forgive me. Thank you!
Almost forgot to mention:
For some reason, it would output {2, 0, 0}
instead of {2, 4, 6}
when using std140
.
Looking at the standards description:
std140: This layout alleviates the need to query the offsets for definitions. The rules of std140 layout explicitly state the layout arrangement of any interface block declared with this layout. This also means that such an interface block can be shared across programs, much like shared. The only downside to this layout type is that the rules for packing elements into arrays/structs can introduce a lot of unnecessary padding.
The rules for std140 layout are covered quite well in the OpenGL specification (OpenGL 4.5, Section 7.6.2.2, page 137). Among the most important is the fact that arrays of types are not necessarily tightly packed. An array of floats in such a block will not be the equivalent to an array of floats in C/C++. The array stride (the bytes between array elements) is always rounded up to the size of a vec4 (ie: 16-bytes). So arrays will only match their C/C++ definitions if the type is a multiple of 16 bytes
Warning: Implementations sometimes get the std140 layout wrong for vec3 components. You are advised to manually pad your structures/arrays out and avoid using vec3 at all.
std430: This layout works like std140, except with a few optimizations in the alignment and strides for arrays and structs of scalars and vector elements (except for vec3 elements, which remain unchanged from std140). Specifically, they are no longer rounded up to a multiple of 16 bytes. So an array of
float
s will match with a C++ array offloat
s.Note that this layout can only be used with shader storage blocks, not uniform blocks.
Particularly the warning seems to suggest that memory layout may be different so I am assuming that input values may be processed differently - but certainly an important catch, one of Kompute's principles is explicit instead of implicit so making it clear would be best
This isn't really an issue, but a warning to anyone who is new to Vulkan and is attempting to use this.
If it seems as if you're GPU is not reading the data you sent to it from the CPU, it might be because
std430
seems to be required.Here is an example of what I mean:
When you type this:
The GLSL Compiler assumes you meant this:
But for some applications, what you actually need might be:
The
stdXXX
signifies the standard it'll be using the transmit the data to the GPU. I don't really know why or how we tell Kompute to use a different standard, butstd430
is more efficient, so you might as well use that anyways.