ProjectPhysX / FluidX3D

The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs via OpenCL. Free for non-commercial use.
https://youtube.com/@ProjectPhysX
Other
3.84k stars 303 forks source link

On exporting data, read_from_device_3d and VTK #143

Closed remi-rc closed 7 months ago

remi-rc commented 8 months ago

Dear Dr Lehmann and other users of FluidX3D,

I figured it would be better for traceability to post here rather than under the YT videos (which are great btw) !

VTK export and units

Currently I am wondering whether lbm.u.write_device_to_vtk(); will export in LBM or SI unit. I think the former, given my analysis of the obtained vtk files. Is there a way to directly export in SI unit instead, or should it already be the case, and I'm doing something wrong ?

Sparse export

I know that cuboid export of VTK has not been implemented yet (would be a great feature I think, but I might be biased of course :) ) ==> as a follow up to your answer to my question on YT, I managed to export data at a given point, as a function of time. This is similar to what you showed in the Ahmed body example, where the Cd coefficient is exported. It looks like this :

const string path = get_exe_path() + "TIME/";
write_file(path + "Uy.dat", "# t\tUy\n");
lbm.run(0u); // initialize simulation
float u_y_test = 1;
while (lbm.get_t() < 1000) { // main simulation loop
  lbm.u.read_from_device();
  u_y_test = lbm.u.y[lbm.index(Nx / 2, Ny - 5, Nz / 2)];
  write_line(path + "Uy.dat", to_string(lbm.get_t()) + "\t" + to_string(u_y_test, 3u) + "\n");
  lbm.run(100u);
}

I am almost satisfied with the method, but as you noted previously, using lbm.u.read_from_device(); is a bit wasteful since everything is copied to the CPU. Since I run on a single GPU, I tried indeed to use

lbm.lbm_domain[0]->u.read_from_device_3d(0, Nx, 0, Ny, 0, Nz, Nx, Ny, Nz);  // this should still copy all the data ? just for tests

However, if I replace my lbm.u.read_from_device(); line with the one above, the export gives me the same initial value over and over.

Could you please help/advise on this ?

Best

ProjectPhysX commented 8 months ago

Hi @remi-rc,

write_device_to_vtk() always exports data in LBM units. Only the axis scaling will be in SI units if units.set_m_kg_s(...) was specified; then the conversion factors for the base units m, kg, s are also printed in the console.

VTK export in SI units would indeed be better. Thank you for this suggestion! I'll put it on my list and leave the issue open as feature request.

I've just tested lbm.lbm_domain[0]->u.read_from_device_3d(0, Nx, 0, Ny, 0, Nz, Nx, Ny, Nz); vs. lbm.u.read_from_device(); again, and for me it works. I need additional information to debug this: what is your hardware, and can you please post the entire main_setup() function?

Kind regards, Moritz

remi-rc commented 8 months ago

Dear Dr Lehmann,

thank you for your fast answer and for considering the issue as a feature request :) I will export these conversion units to not forget them in the meantime !

The hardware is an NVIDIA GeForce GTX 1660 Super.

As for the test case in question, I took the cylinder in rectangular duct :

void main_setup() { // cylinder in rectangular duct; required extensions in defines.hpp: VOLUME_FORCE, INTERACTIVE_GRAPHICS
    // ################# Define simulation box size, viscosity and volume force #################
    const float Re = 25000.0f;
    const float D = 32.0f;
    const float u = rsqrt(3.0f);
    const float w=D, l=12.0f*D, h=3.0f*D;
    const float nu = units.nu_from_Re(Re, D, u);
    const float f = units.f_from_u_rectangular_duct(w, D, 1.0f, nu, u);
    LBM lbm(to_uint(w), to_uint(l), to_uint(h), nu, 0.0f, f, 0.0f);

// ################# Define geometry #################
    const uint Nx=lbm.get_Nx(), Ny=lbm.get_Ny(), Nz=lbm.get_Nz(); parallel_for(lbm.get_N(), [&](ulong n) { uint x=0u, y=0u, z=0u; lbm.coordinates(n, x, y, z);
        lbm.u.y[n] = 0.1f*u;
        if(cylinder(x, y, z, float3(lbm.center().x, 2.0f*D, lbm.center().z), float3(Nx, 0u, 0u), 0.5f*D)) lbm.flags[n] = TYPE_S;
        if(x==0u||x==Nx-1u||z==0u||z==Nz-1u) lbm.flags[n] = TYPE_S; // x and z non periodic
    }); 
// ################# Run simulation, export images and data #################

    const string path = get_exe_path() + "TIME/";
    write_file(path + "Uy.dat", "# t\tUy\n");

    lbm.run(0u); // initialize simulation
    float u_y_test = 1;

    while (lbm.get_t() < 2000) { // main simulation loop

        lbm.lbm_domain[0]->u.read_from_device_3d(0, Nx, 0, Ny, 0, Nz, Nx, Ny, Nz);  // very slow for some reason compared with the line below
        //lbm.u.read_from_device();  // works again if I uncomment 
        u_y_test = lbm.u.y[lbm.index(Nx / 2, Ny - 5, Nz / 2)];
        write_line(path + "Uy.dat", to_string(lbm.get_t()) + "\t" + to_string(u_y_test, 3u) + "\n");

        lbm.run(100u);
    }
} 

I will get access to other GPUs today so maybe the problem will go away. Also I'm currently on Windows 10 with my GTX 1660, running the code via Microsoft Visual Studio Community 17.8.5.

King regards,

Rémi

ProjectPhysX commented 7 months ago

Hi @remi-rc,

the function call lbm.lbm_domain[0]->u.read_from_device_3d(0, Nx, 0, Ny, 0, Nz, Nx, Ny, Nz); here is slow, because unlike lbm.u.read_from_device() which enqueues one single PCIe transfer for the entire domain, it enqueues many small row-wise PCIe transfers in nested loops. Each enqueueReadBuffer call comes with additional latency/overhead, so using many small enqueueReadBuffer calls instead of one large one is much slower.

The workaround with lbm.lbm_domain[0]->u.read_from_device_3d(...); really is intended for sparse data copy, like a single value, one row, or a slice - only then it is much faster. In your setup, use

lbm.lbm_domain[0]->u.read_from_device_3d(Nx/2, Nx/2+1, Ny-5, Ny-5+1, Nz/2, Nz/2+1, Nx, Ny, Nz);
u_y_test = lbm.u.y[lbm.index(Nx / 2, Ny - 5, Nz / 2)];

to PCIe-copy only the one value you later export.


I have now also updated the .vtk export functions such that they automatically convert the data to SI units when units.set_m_kg_s(...) was specified in main_setup(). Thanks again for this great feature suggestion!

Kind regards, Moritz

remi-rc commented 7 months ago

Dear Dr Lehmann,

thanks a lot for this new feature, can't wait to try it out :)

Regards, Rémi

ProjectPhysX commented 7 months ago

Hi @remi-rc,

note that in the latest update I've done some refactoring, and there is a renaming from lbm.lbm[0]->u.read_from_device_3d(...); to lbm.lbm_domain[0]->u.read_from_device_3d(...);. This is because it was previously confusing in the code what is an object of LBM class vs. LBM_Domain class. You might have to modify this in your setup scripts.

Kind regards, Moritz