NVIDIA / gvdb-voxels

Sparse volume compute and rendering on NVIDIA GPUs
Other
679 stars 145 forks source link

GVDB_VOX macro and aprons #35

Closed alvingitlord closed 5 years ago

alvingitlord commented 6 years ago

https://github.com/NVIDIA/gvdb-voxels/blob/e9a2350ab07895bd11dff633b52ce52d0190e07b/source/gvdb_library/kernels/cuda_gvdb_operators.cuh#L298-L300 Here GVDB_VOXseems to cover the apron voxel in atlas as well. Is it by design? Firstly the + make_uint3(1,1,1); part seems to indicate that GVDB_VOXonly supports one layer apron setup. Secondly vox is a continuous uint3which covers apron voxels in atlas as well. Despite slight increase of computation amount, this will typically cause no problem because even the user directly changes value of apron voxel using vox as index entry in a kernel function, UpdateApron() will still overwrite the aprons to the correct value. But what if two successive operations e.g. adding noise then smoothing are manually coded in a kernel without calling UpdateApron() in between, there could be a discrepancy of apron values.

My suggestion is to modify uint3 vox as

uint3 vox= blockIdx * blockDim + threadIdx;
uint3 brick_idx = vox/ (brick_res - apron - apron);
vox += brick_idx * apron * 2 + apron;

so this can skip the apron voxels when going through atlas.

ramakarl commented 5 years ago

Hi, This is an interesting suggestion, but changing GVDB_VOX directly would be a fairly big change since this is used by many different kernels. In order to accept such a big change, you would need to create a pull request that also included testing of all the existing samples.

Instead, I would suggest that you consider a pull request where you simply provide a new macro, say GVDB_VOX_ONLY, or GVDB_VOX_COVER, that only touches the actual voxels. Keep the original GVDB_VOX as is. Note that your solution is not yet complete, because while you have resolved the indexing to just touch actual voxels, you would also need to modify the host code to reduce the number of threads being launch. Otherwise you still have the same GPU grid size, and will not get much benefit, just cause some threads to be idle. The grid launch itself would need to only launch 1 thread per real voxel (/w apron), and then the indexing maps the logical thread to a real voxel, as you have done.

alvingitlord commented 5 years ago

Thanks for the comment, I will close this issue and may be create a pull request in the future