NVIDIA / gvdb-voxels

Sparse volume compute and rendering on NVIDIA GPUs
Other
679 stars 145 forks source link

Does GVDB need more customization? #20

Open passos-vladimir opened 6 years ago

passos-vladimir commented 6 years ago

Hi,

I am working on a simulation and we started using GVDB to handle self collisions. So far, we have got it working nicely, but I needed to make a bunch of modifications in the source and would like to know if they were really necessary and/or are planned to be included in future releases.

As far as I am aware, in the InsertPointsSubcell, launching cuFunc[FUNC_INSERT_SUBCELL] copies the point positions in the slot of each sub-cell overlapping the AABB of the points. Optionally, it copies velocities and colors.

For our application, we needed different info. We could have reused the color attribute, since we are not using it anyway, but the algorithm is not finished yet and we will probably need to add more attributes... Since I couldn't find any way to launch a custom "myInsertPointsSubcellKernel", we recompiled the SDK with a InsertPointsSubcell with an extra argument of type CUfunction.

In the "myInsertPointsSubcellKernel" instead of positions, we store just the point ID. Further, we are also using a custom "gather" function where we read that ID and actually compute the needed info into custom channels.

Since we will be using GVDB with very large point clouds, copying only the ID consumes less memory and gives us more freedom to compute custom channels.

So, it was not a big change in the core, just extra methods. A "InsertPointsSubcell" and a "gather" that accept a user CUfunction.

thank you.

alvingitlord commented 6 years ago

Hi, @passos-vladimir I found a way to create your custom kernel in current code structure.

The current .cuh file in source code is really messy and not very friendly for extension. Right now .cuh files do not have include list of their own but contain the definition of each kernel function. When you are trying to create a new .cu file, say "myKernel.cu" for your own purpose, you need to include those .cuh files

#include <cuda.h>
#include <host_defines.h>
#include <cuda_math.cuh>
//----------------------------------------- GVDB Data Structure 
#define CUDA_PATHWAY

#include "cuda_gvdb_scene.cuh"      // GVDB Scene  
#include "cuda_gvdb_nodes.cuh"      // GVDB Node structure 
#include "cuda_gvdb_geom.cuh"       // GVDB Geom helpers
#include "cuda_gvdb_dda.cuh"        // GVDB DDA 
#include "cuda_gvdb_raycast.cuh"    // GVDB Raycasting   
#include "radixsort_kernel.cuh"
//----------------------------------------- 

// Operator functions
#include "cuda_gvdb_operators.cuh"

// Particle functions 
#include "cuda_gvdb_particles.cuh"

// Your own kernel
extern "C" __global__ myKernel(float* foo)
{}

This creates several problem. First, in the new myKernel.ptx that comes as output, all functions in those library *.cuh files are being recompiled. Now you have cuda_gvdb_module.ptx and myKernel.ptx both having the definition of e.g. gvdbReduction. Second, all cuda_math.cuh related functions e.g. make_int3 or simply float3 will indicate syntax error in Visual studio because float3 are actually defined only if you include cuda_runtime.h, instead of cuda.h.

My solution for now, is to define a myKernel.cuh file, and add to the include list of cuda_gvdb_module.cu. In myKernel.cuh, dont include anything. You will still have all kinds of syntax error highlights but the new cuda_gvdb_module.ptx file will compile just fine. Then in host side, you can either use the SDK function VolumeGVDB::LoadFunction to add your kernel function into primary function list. You can add a macro such as FUNC_MY after line 233 of gvdb_volume_gvdb.h #define FUNC_EXPANDC 117 without recompile the host side library. Then call your kernel function like the library calls its own function, such as in gvdb_volume_gvdb.cpp line 4998-5020

void VolumeGVDB::Resample ( uchar chan, Matrix4F xform, Vector3DI in_res, char in_aux, Vector3DF inr, Vector3DF outr )
{
    PrepareVDB ();

    PUSH_CTX

    // Determine grid and block dims (must match atlas bricks)  
    Vector3DI block ( 8, 8, 8 );
    Vector3DI res = mPool->getAtlasRes( chan );
    Vector3DI grid ( int(res.x/block.x)+1, int(res.y/block.y)+1, int(res.z/block.z)+1 );    

    // Send transform matrix to cuda
    PrepareAux ( AUX_MATRIX4F, 16, sizeof(float), true, true );
    memcpy ( mAux[AUX_MATRIX4F].cpu, xform.GetDataF(), 16*sizeof(float) );
    CommitData ( mAux[AUX_MATRIX4F] );

    void* args[8] = { &cuVDBInfo, &res, &chan, &in_res, &mAux[in_aux].gpu, &mAux[AUX_MATRIX4F].gpu, &inr, &outr };

    cudaCheck ( cuLaunchKernel ( cuFunc[FUNC_RESAMPLE], grid.x, grid.y, grid.z, block.x, block.y, block.z, 0, NULL, args, NULL ), 
                    "VolumeGVDB", "Resample", "cuLaunch", "FUNC_RESAMPLE", mbDebug);

    POP_CTX
}

But you can only add your custom function up to 256 total functions, since MAX_FUNC = 255. Or, you can use what the programming guide suggested driver API approach, calling cuModuleLoad, cuModuleGetFunction and cuModuleGetGlobal.

I personally feel the design of VolumeGVDB::ComputeKernel unnecessary because it blocks the interaction of GVDB data with other GPU resources. But you can always explicitly copy the content of VolumeGVDB::ComputeKernel to your own code.

I am really struggling with the design of GVDB. The API right now feels to me having to many unwritten rules. Maybe it is similar to OpenVDB but I have never used the latter. The redefinition of basic types such as float3 and int3 to Vector3DF and Vector3DI seems redundant and causing implicit type cast error. And the custom kernel creation process as I described before is very anti-human. I hope the core kernel functions can be compiled into *.ptx file and custom kernel.cu file will only include their function declarations. Also, it would be nice if host side gvdb does not need build anymore, but a lightweight "download then include" type.

Hope this comment works for you.

passos-vladimir commented 6 years ago

Hi, @Alvininorge , thanks for the reply!

we are mainly using the api approach to run with ComputeKernel, but for our specific needs there is no API support (that I am aware of).

for example, our points have variable radius. The InsertPointSubcell by default supports only a constant radius so we need a custom InsertPointSubcell kernel that accounts for this small modification. The host method that calls it does not accept a CUfunction as argument, so we need a custom host method for that too. The host method itself is very long, but only a small part of it is being modified.

Essentially, the InsertPointSubcell traverses all the user objects and puts some info in the subcells that overlap the AABB of each object. The current API has this behavior specialized, it traverses all the points in a point cloud, each being the center of a sphere with a global radius. Also, it stores positions, velocities and colors. Instead, we think it would be better to store the point id only.

In our application, we need a gather function that computes different info, i.e., mass density of each voxel, weighted velocities and so on... following the API standard, we would need an AUX memory for each necessary info to be stored in the subcells and access them in the gather kernel. Each AUX definition is added to the header and the device pointers are given to the kernels by the insertPointSubcell and gather methods implemented in the library. Because we are still experimenting different ideas, the list of necessary info changes frequently, and we would need to recompile the library everytime.

I think the most generic way would be to just store each object ID and let the user decides what info he needs in the gather. that would allow for a general insertObjectSubcell (apart from the AABB, which is application specific)

because of a few dependencies with the subcells logic, the API still does not support custom gather functions. Using the generic approach of storing ids, the API could have a ComputeGather that accepts a CUfunction, and the logic of the kernel would be decided by the user.

I imagine that the subcell logic may go under further modifications and the API might change entirely with future releases, but as for now, the generic approach is working nicely for us.