ProjectPhysX / FluidX3D

The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs via OpenCL. Free for non-commercial use.
https://youtube.com/@ProjectPhysX
Other
3.88k stars 308 forks source link

GeForce9600GT dose not working #34

Closed tyrafluid closed 1 year ago

tyrafluid commented 1 year ago

Hi @ProjectPhysX , First of all, thank you for opening your fantastic code. I got an error that I could not deal with. I use GeForce 9600GT and I tried "#define USE_OPENCL_1_1" . In spite of that it dose not calculate anything. What is problem? Could you tell me the reason? My GeForce driver version is 342.01(the newest for my GPU) image

Thank you.

ProjectPhysX commented 1 year ago

Hi @tyrafluid, I've had the same issue testing the Quadro NVS 290. local memory is not yet supported in OpenCL 1.0. Luckily, I use local memory only in the voxelization, which is not essential. The solution is simple: in src/kernel.cpp, starting at line 1867, replace the entire voxelize_mesh kernel function with this:

)+R(kernel void voxelize_mesh(global uchar* flags, const uchar flag, const global float* p0, const global float* p1, const global float* p2, const uint triangle_number, float x0, float y0, float z0, float x1, float y1, float z1) { // voxelize triangle mesh
    const uint n = get_global_id(0); // n = x+(y+z*Ny)*Nx
    const float3 p = position(coordinates(n))+(float3)(0.5f*(float)def_Nx-0.5f, 0.5f*(float)def_Ny-0.5f, 0.5f*(float)def_Nz-0.5f);
    if(p.x<x0||p.y<y0||p.z<z0||p.x>x1||p.y>y1||p.z>z1) return; // return straight away if grid point is outside the bounds of the mesh (~4x faster)
    const float3 r0_origin = p;
    const float3 r1_origin = p;
    const float3 r0_direction = (float3)(+0.01f, +0.04f, +1.03f); // from each grid point, shoot an outward ray and count how often it intersects the mesh, odd number -> grid point is inside mesh
    const float3 r1_direction = (float3)(-0.05f, -0.06f, -1.07f); // to eliminate errors, repeat with a second ray in a different random direction
    uint intersections_0=0u, intersections_1=0u;
    for(uint i=0u; i<triangle_number; i++) {
        const float3 p0i = (float3)(p0[3u*i], p0[3u*i+1u], p0[3u*i+2u]);
        const float3 p1i = (float3)(p1[3u*i], p1[3u*i+1u], p1[3u*i+2u]);
        const float3 p2i = (float3)(p2[3u*i], p2[3u*i+1u], p2[3u*i+2u]);
        const float3 u=p1i-p0i, v=p2i-p0i;
        {
            const float3 w=r0_origin-p0i, h=cross(r0_direction, v), q=cross(w, u);
            const float f=1.0f/dot(u, h), s=f*dot(w, h), t=f*dot(r0_direction, q);
            intersections_0 += (uint)(s>=0.0f&&s<=1.0f&&t>=0.0f&&s+t<=1.0f&&f*dot(v, q)>0.0f);
        } {
            const float3 w=r1_origin-p0i, h=cross(r1_direction, v), q=cross(w, u);
            const float f=1.0f/dot(u, h), s=f*dot(w, h), t=f*dot(r1_direction, q);
            intersections_1 += (uint)(s>=0.0f&&s<=1.0f&&t>=0.0f&&s+t<=1.0f&&f*dot(v, q)>0.0f);
        }
    }
    if(intersections_0%2u&&intersections_1%2u) flags[n] = flag;
} // voxelize_mesh()
tyrafluid commented 1 year ago

Thank you for quick responding @ProjectPhysX . I got another errors. image I added "const float3 p0i = (float3)(p0[3u i], p0[3u i + 1u], p0[3u i + 2u]); const float3 p1i = (float3)(p1[3u i], p1[3u i + 1u], p1[3u i + 2u]); const float3 p2i = (float3)(p2[3u i], p2[3u i + 1u], p2[3u * i + 2u]); const float3 u = p1i - p0i, v = p2i - p0i;" .

However, my gpu havent got enough memory.... image

ProjectPhysX commented 1 year ago

@tyrafluid oh I missed a few lines. See the updated voxelize_mesh kernel again and copy/paste it in your code. With the memory, the default benchmark uses a 256³ box which is too large for the 9600 GT. In src/setup.cpp in line 946, change resolution down to 176³, then it should work.