GPUSPH / gpusph

The world's first CUDA implementation of Weakly-Compressible Smoothed Particle Hydrodynamics
157 stars 67 forks source link

Messy whole program structure #36

Open sanguinariojoe opened 5 years ago

sanguinariojoe commented 5 years ago

Your are clearly not the first CFD libraries which is asking the user to code his own solver, e.g. Palabos, ASL, and so far OpenFOAM. However, the way GPUSPH is doing that is a mess...

In my opinion, GPUSPH shall be refactored in such a way that everything outside src/problems can be compiled as one or more libraries, and subsequently everything inside src/problems can be linked linked against those libs.

Right now, there is some sort of "main", which is basically managing everything to rely the control to the selected problem... As a consequence, each compiled problem is a whole new GPUSPH instance... From all the issues potentially associated to this behaviour, my favourite one is the error hiding.

Narcolessico commented 5 years ago

Hi Jose,

the main structure of GPUSPH comes from a small simulator that was not even written in C; then ported to exploit the GPU with CUDA, then multi-GPU and multi-node. All of this with from a handful of main developers and a lot of small, precious contributions. Many contributions came by passionate researchers who do not have a background in professional software development and were often overwhelmed with tasks (you know how research works - one can't just work full-time "only" on maintenance for months).

You're not the first one realizing that the overall structure has large improvement margins :-) but it is less trivial to realize that there are a lot of ideas, list of improvements, discussions and tests in the TODO list, not to mention side activities like modeling and interpreting experiments. What is lacking is not awareness or good will, but rather what in the software industry is called the "capacity" (aka: there are very few contributors with very few time available), while deadlines end up increasing the "technical debt". The approach currently used is the same used in many in the industry: rather than an abrupt rewrite, a long, incremental improvement process. You will find a lot of traces of it in the commit history.

Every feedback is welcome but it is especially constructive when it ends with "I propose". Could you make a detailed proposal, possibly in form of code by forking the repository and implementing its main lines? I am sure the merge request will get all the attention it deserves.

Thank you for your feedback!

Oblomov commented 5 years ago

In addition to what @Narcolessico wrote, there are both historical and practical reason for the current design. And while we are steadily moving towards an improved internal architecture, this is still done within the constraints imposed by available time and workforce, which as mentioned isn't exactly abundant.

Concerning specifically the current integrated relationship between user problems and main application, this is due to a choice made in finding an optimal compromise between development time and code performance. More in detail, the device code massively relies on templates to fence optional components (both code and variables), and avoid the mere support for additional features from having any impact at all on performance. The downside of this approach is that to support runtime selection of options requires the compilation of all (compatible) option combinations, which results in extremely long compile times. In fact, the choice to make the framework options a compile-time decision in the user problem came when, even with considerably less variety than we have today, it took several hours to come out with the independent main executable, which completely destroyed the performance of the edit/compile/run cycle during development.

I can't say I'm happy with the decision, and as a matter of fact, we do have a solution planned, which consists in moving away from the current “single source” approach for the device code into a “separate source” approach, which is supported in recent versions of CUDA via NVRTC.

Following up to @Narcolessico proposal to “put your money where your mouth is”, you could for example try and implement dynamic loading of specific instances of the device code via NVRTC, as a first step; this would subsequently allow the CUDASimFramework to be instantiated from within the GPUWorker, rather than the user problem, which is a prerequisite to untangle the user problem from the rest of GPUSPH.

sanguinariojoe commented 5 years ago

Hey guys!

First of all, regarding netiquette... No offence, but impolite and not-diplomatic language is the generally accepted approach in free software development. It is, because polite discussions are not pragmatic, and don't let evaluate the level of concern. Now (since https://github.com/GPUSPH/gpusph/issues/31) you have a fully open project, so I would expect this short of comments quite often.

Moving back to the issue itself...

You're not the first one realizing that the overall structure has large improvement margins :-) but it is less trivial to realize that there are a lot of ideas, list of improvements, discussions and tests in the TODO list

Then publish the TODO list in the repo, and state in README how feature requests will be eventually addressed. Anyway, I would say that a bad structured code is a major issue... At least if you really want to get 3rd party developers contributing.

I can't say I'm happy with the decision, and as a matter of fact, we do have a solution planned, which consists in moving away from the current “single source” approach for the device code into a “separate source” approach, which is supported in recent versions of CUDA via NVRTC.

That's good! But AFAIK the required changes are not that great (for this specific problem). Of course, the "host side" can be easily adapted, as can be appreciated in main.cc:381 line (you just need to figure out how to reallocate the NetworkManager code). Regarding the "device side", using CMake (see https://github.com/GPUSPH/gpusph/pull/35) should not be hard to pack all the deps (revamping a bit the declarations) into a library. Further details here:

https://devblogs.nvidia.com/building-cuda-applications-cmake/

Let me know!

Oblomov commented 5 years ago

No offence, but impolite and not-diplomatic language is the generally accepted approach in free software development. It is, because polite discussions are not pragmatic, and don't let evaluate the level of concern.

That is not really the case. Most of the discussion in and around free software development, even when not-diplomatic and straight-to-the-point is not impolite. If you are unable to communicate without coming off as arrogant and sanctimonious, then we can do without the communication, especially when you fail at being both polite and pragmatic —the only result of which is the time is spent in this kind of useless discussions.

you have a fully open project, so I would expect this short of comments quite often.

So far you've been the only one, and you can expect anyone coming out as badly to be treated the same way.

Then publish the TODO list in the repo, and state in README how feature requests will be eventually addressed.

These are things that will happen in time, as time permits and things evolve.

Anyway, I would say that a bad structured code is a major issue... At least if you really want to get 3rd party developers contributing.

You'll find that there are several areas where contributions are possible despite the structural limitations of the current implementation.

But AFAIK the required changes are not that great (for this specific problem). Of course, the "host side" can be easily adapted, as can be appreciated in main.cc:381 line (you just need to figure out how to reallocate the NetworkManager code). Regarding the "device side", using CMake (see #35) should not be hard to pack all the deps (revamping a bit the declarations) into a library.

You are looking at the situation from the wrong side up. A better build system is completely irrelevant with the current GPUSPH structure. The current GPUSPH structure (as in: the relationship between the GPUSPH class itself, main and the user problems) cannot change until the problem is decoupled from the framework. The problem cannot be decoupled from the framework until dynamic device code loading via NVRTC is implemented.

Here's the pragmatic approach you can follow if you actually wish to contribute something useful instead of vague criticism from the peanut gallery:

  1. implement dynamic framework code loading via NVRTC as an alternative functional alongside the current compile-time “single-source” framework;
  2. add support for the NVRTC framework implementation to the Problem and XProblem class; among other things, this should provide an API to the user problems that need to load their own device code for the custom kernels needed by some options (e.g. open boundaries, CALC_PRIVATE postprocessing, etc);
  3. move the existing test problems to the NVRTC framework implementation;
  4. deprecate the existing single-source framework implementation.

If you manage to implement these preliminary steps fast enough, they could be merged for 5.0, which would allow us to obsolete the single-source framework for the next major release, allowing us to move forward in the refactoring for the libification of GPUSPH.

sanguinariojoe commented 5 years ago

So far you've been the only one, and you can expect anyone coming out as badly to be treated the same way.

Jajaja, that's great ;-) As I said, this is not a personal attack... Even though you decided to go that way with comments like "If you are unable to communicate without coming off as arrogant and sanctimonious", or "vague criticism from the peanut gallery". Indeed a great mix of Straw Man and Ad hominem fallacies.

Maybe we can stop the weapon climbing at this point.

These are things that will happen in time, as time permits and things evolve.

I consider that a major issue too... It is not I don't understand your point... I really do... But you cannot reply you have not time to work on something because a long todo list, without publishing such todo list... That would make people feel you just simply don't care... Recent similar situations here and here.

BTW, you can add labels/tags to the issues. You know, like "feature request", "bug", "enhancement" and so on... Honestly, I don't do that in my own repos, but apparently people like it

You are looking at the situation from the wrong side up. A better build system is completely irrelevant with the current GPUSPH structure. The current GPUSPH structure (as in: the relationship between the GPUSPH class itself, main and the user problems) cannot change until the problem is decoupled from the framework. The problem cannot be decoupled from the framework until dynamic device code loading via NVRTC is implemented.

I completely disagree! Wanna you get an extensible code by means of NVRTC? perfect! I like it! In fact AQUAgpusph works like that. But I really can't see what has it to do with getting a main + libs structure... I made AQUAgpusph use that paradigm much before going modular/extensible.

For instance I consider src/problems/DamBreak3D.cu (probably not the most complex example, I know):

    const bool WET = get_option("wet", false);
    const bool USE_PLANES = get_option("use_planes", false);
    const uint NUM_OBSTACLES = get_option("num_obstacles", 1);
    const bool ROTATE_OBSTACLE = get_option("rotate_obstacle", true);
    const uint NUM_TESTPOINTS = get_option("num_testpoints", 3);
    // density diffusion terms: 0 none, 1 Ferrari, 2 Molteni & Colagrossi, 3 Brezzi
    const int RHODIFF = get_option("density-diffusion", 2);
(...)
    // Allow user to set the MLS frequency at runtime. Default to 0 if density
    // diffusion is enabled, 10 otherwise
    const int mlsIters = get_option("mls",
        (simparams()->densitydiffusiontype != DENSITY_DIFFUSION_NONE) ? 0 : 10);

Directly came from main...

    // ** framework setup
    // viscosities: KINEMATICVISC*, DYNAMICVISC*
    // turbulence models: ARTVISC*, SPSVISC, KEPSVISC
    // boundary types: LJ_BOUNDARY*, MK_BOUNDARY, SA_BOUNDARY, DYN_BOUNDARY*
    // * = tested in this problem
    SETUP_FRAMEWORK(
        viscosity<ARTVISC>,
        boundary<DYN_BOUNDARY>
    ).select_options(
        RHODIFF == FERRARI, densitydiffusion<FERRARI>(),
        RHODIFF == BREZZI, densitydiffusion<BREZZI>(),
        RHODIFF == COLAGROSSI, densitydiffusion<COLAGROSSI>(),
        USE_PLANES, add_flags<ENABLE_PLANES>()
    );

    // will dump testpoints separately
    addPostProcess(TESTPOINTS);
(...)
    if (mlsIters > 0)
        addFilter(MLS_FILTER, mlsIters);

    // Explicitly set number of layers. Also, prevent having undefined number of layers before the constructor ends.
    setDynamicBoundariesLayers(3);

    resize_neiblist(128);

    // *** Initialization of minimal physical parameters
    set_deltap(0.015f);
    //simparams()->dt = 0.00005;
    physparams()->r0 = m_deltap;
    physparams()->gravity = make_float3(0.0, 0.0, -9.81);
    const float g = length(physparams()->gravity);
    const double H = 0.4;
    physparams()->dcoeff = 5.0f * g * H;
    add_fluid(1000.0);

    //add_fluid(2350.0);
    set_equation_of_state(0,  7.0f, 20.0f);
    set_kinematic_visc(0, 1.0e-2f);
    //set_dynamic_visc(0, 1.0e-4f);

    // default tend 1.5s
    simparams()->tend=1.5f;
    //simparams()->ferrariLengthScale = H;
    simparams()->densityDiffCoeff = 0.1f;
    /*physparams()->artvisccoeff =  0.2;
    physparams()->smagfactor = 0.12*0.12*m_deltap*m_deltap;
    physparams()->kspsfactor = (2.0/3.0)*0.0066*m_deltap*m_deltap;
    physparams()->epsartvisc = 0.01*simparams()->slength*simparams()->slength;*/

    // Drawing and saving times
    add_writer(VTKWRITER, 0.005f);

Can't this be librified?

    // *** Other parameters and settings
    m_name = "DamBreak3D";

    // *** Geometrical parameters, starting from the size of the domain
    const double dimX = 1.6;
    const double dimY = 0.67;
    const double dimZ = 0.6;
    const double obstacle_side = 0.12;
    const double obstacle_xpos = 0.9;
    const double water_length = 0.4;
    const double water_height = H;
    const double water_bed_height = 0.1;

    // If we used only makeUniverseBox(), origin and size would be computed automatically
    m_origin = make_double3(0, 0, 0);
    m_size = make_double3(dimX, dimY, dimZ);

Just attributes

    // set positioning policy to PP_CORNER: given point will be the corner of the geometry
    setPositioning(PP_CORNER);

    // main container
    if (USE_PLANES) {
        // limit domain with 6 planes
        makeUniverseBox(m_origin, m_origin + m_size);
    } else {
        GeometryID box =
            addBox(GT_FIXED_BOUNDARY, FT_BORDER, m_origin, dimX, dimY, dimZ);
        // we simulate inside the box, so do not erase anything
        setEraseOperation(box, ET_ERASE_NOTHING);
    }

    // Planes unfill automatically but the box won't, to void deleting all the water. Thus,
    // we define the water at already the right distance from the walls.
    double BOUNDARY_DISTANCE = m_deltap;
    if (simparams()->boundarytype == DYN_BOUNDARY && !USE_PLANES)
            BOUNDARY_DISTANCE *= getDynamicBoundariesLayers();

    // Add the main water part
    addBox(GT_FLUID, FT_SOLID, Point(BOUNDARY_DISTANCE, BOUNDARY_DISTANCE, BOUNDARY_DISTANCE),
        water_length - BOUNDARY_DISTANCE, dimY - 2 * BOUNDARY_DISTANCE, water_height - BOUNDARY_DISTANCE);
    // Add the water bed if wet. After we'll implement the unfill with custom dx, it will be possible to declare
    // the water bed overlapping with the main part.
    if (WET) {
        addBox(GT_FLUID, FT_SOLID,
            Point(water_length + m_deltap, BOUNDARY_DISTANCE, BOUNDARY_DISTANCE),
            dimX - water_length - BOUNDARY_DISTANCE - m_deltap,
            dimY - 2 * BOUNDARY_DISTANCE,
            water_bed_height - BOUNDARY_DISTANCE);
    }

    // set positioning policy to PP_BOTTOM_CENTER: given point will be the center of the base
    setPositioning(PP_BOTTOM_CENTER);

    // add one or more obstacles
    const double Y_DISTANCE = dimY / (NUM_OBSTACLES + 1);
    // rotation angle
    const double Z_ANGLE = M_PI / 4;

        // activate the solid obstacle
    for (uint i = 0; i < NUM_OBSTACLES; i++) {
        // Obstacle is of type GT_MOVING_BODY, although the callback is not even implemented, to
        // make the forces feedback available
        GeometryID obstacle = addBox(GT_MOVING_BODY, FT_BORDER,
            Point(obstacle_xpos, Y_DISTANCE * (i+1) + (ROTATE_OBSTACLE ? obstacle_side/2 : 0), 0),
                obstacle_side, obstacle_side, dimZ );
        if (ROTATE_OBSTACLE) {
            rotate(obstacle, 0, 0, Z_ANGLE);
            // until we'll fix it, the rotation centers are always the corners
            // shift(obstacle, 0, obstacle_side/2, 0);
        }
        // enable force feedback to measure forces
        enableFeedback(obstacle);
    }

    // Optionally, add a floating objects
    /*
    // set positioning policy to PP_CENTER: given point will be the geometrical center of the object
    setPositioning(PP_CENTER);
    GeometryID floating_obj =
        addSphere(GT_FLOATING_BODY, FT_BORDER, Point(water_length, dimY/2, water_height), obstacle_side);
    // half water density to make it float
    setMassByDensity(floating_obj, physparams()->rho0[0] / 2);
    setParticleMassByDensity(floating_obj, physparams()->rho0[0] / 2);
    // disable collisions: will only interact with fluid
    // disableCollisions(floating_obj);
    */

    // add testpoints
    const float TESTPOINT_DISTANCE = dimZ / (NUM_TESTPOINTS + 1);
    for (uint t = 0; t < NUM_TESTPOINTS; t++)
        addTestPoint(Point(0.25*dimX, dimY/2.0, (t+1) * TESTPOINT_DISTANCE/2.0));

    for (uint t = 0; t < NUM_TESTPOINTS; t++)
        addTestPoint(Point(0.4*dimX, dimY/2.0, (t+1) * TESTPOINT_DISTANCE/2.0));

    for (uint t = 0; t < NUM_TESTPOINTS; t++)
        addTestPoint(Point(0.75*dimX, dimY/2.0, (t+1) * TESTPOINT_DISTANCE/2.0));

    for (uint t = 0; t < NUM_TESTPOINTS; t++)
        addTestPoint(Point(0.9*dimX, dimY/2.0, (t+1) * TESTPOINT_DISTANCE/2.0));

Is this not a geometry library?

I'm not as used as you to CUDA way of life, but are you sure you are not trying to make several changes in the same shot?

Narcolessico commented 5 years ago

Then publish the TODO list in the repo

Just a small remark on this point: there is no centralized TODO list AFAIK. There are TODOs in the code, some just orally discussed, some in shared documents. I would be useful to have them all together and prioritized, but at the time being there no single file to just push to the repo.