Closed tyrafluid closed 1 year ago
Hi @tyrafluid, I've had the same issue testing the Quadro NVS 290. local
memory is not yet supported in OpenCL 1.0. Luckily, I use local memory only in the voxelization, which is not essential. The solution is simple: in src/kernel.cpp
, starting at line 1867, replace the entire voxelize_mesh
kernel function with this:
)+R(kernel void voxelize_mesh(global uchar* flags, const uchar flag, const global float* p0, const global float* p1, const global float* p2, const uint triangle_number, float x0, float y0, float z0, float x1, float y1, float z1) { // voxelize triangle mesh
const uint n = get_global_id(0); // n = x+(y+z*Ny)*Nx
const float3 p = position(coordinates(n))+(float3)(0.5f*(float)def_Nx-0.5f, 0.5f*(float)def_Ny-0.5f, 0.5f*(float)def_Nz-0.5f);
if(p.x<x0||p.y<y0||p.z<z0||p.x>x1||p.y>y1||p.z>z1) return; // return straight away if grid point is outside the bounds of the mesh (~4x faster)
const float3 r0_origin = p;
const float3 r1_origin = p;
const float3 r0_direction = (float3)(+0.01f, +0.04f, +1.03f); // from each grid point, shoot an outward ray and count how often it intersects the mesh, odd number -> grid point is inside mesh
const float3 r1_direction = (float3)(-0.05f, -0.06f, -1.07f); // to eliminate errors, repeat with a second ray in a different random direction
uint intersections_0=0u, intersections_1=0u;
for(uint i=0u; i<triangle_number; i++) {
const float3 p0i = (float3)(p0[3u*i], p0[3u*i+1u], p0[3u*i+2u]);
const float3 p1i = (float3)(p1[3u*i], p1[3u*i+1u], p1[3u*i+2u]);
const float3 p2i = (float3)(p2[3u*i], p2[3u*i+1u], p2[3u*i+2u]);
const float3 u=p1i-p0i, v=p2i-p0i;
{
const float3 w=r0_origin-p0i, h=cross(r0_direction, v), q=cross(w, u);
const float f=1.0f/dot(u, h), s=f*dot(w, h), t=f*dot(r0_direction, q);
intersections_0 += (uint)(s>=0.0f&&s<=1.0f&&t>=0.0f&&s+t<=1.0f&&f*dot(v, q)>0.0f);
} {
const float3 w=r1_origin-p0i, h=cross(r1_direction, v), q=cross(w, u);
const float f=1.0f/dot(u, h), s=f*dot(w, h), t=f*dot(r1_direction, q);
intersections_1 += (uint)(s>=0.0f&&s<=1.0f&&t>=0.0f&&s+t<=1.0f&&f*dot(v, q)>0.0f);
}
}
if(intersections_0%2u&&intersections_1%2u) flags[n] = flag;
} // voxelize_mesh()
Thank you for quick responding @ProjectPhysX . I got another errors. I added "const float3 p0i = (float3)(p0[3u i], p0[3u i + 1u], p0[3u i + 2u]); const float3 p1i = (float3)(p1[3u i], p1[3u i + 1u], p1[3u i + 2u]); const float3 p2i = (float3)(p2[3u i], p2[3u i + 1u], p2[3u * i + 2u]); const float3 u = p1i - p0i, v = p2i - p0i;" .
However, my gpu havent got enough memory....
@tyrafluid oh I missed a few lines. See the updated voxelize_mesh
kernel again and copy/paste it in your code.
With the memory, the default benchmark uses a 256³ box which is too large for the 9600 GT. In src/setup.cpp
in line 946, change resolution down to 176³, then it should work.
Hi @ProjectPhysX , First of all, thank you for opening your fantastic code. I got an error that I could not deal with. I use GeForce 9600GT and I tried "#define USE_OPENCL_1_1" . In spite of that it dose not calculate anything. What is problem? Could you tell me the reason? My GeForce driver version is 342.01(the newest for my GPU)
Thank you.