KratosMultiphysics / Kratos

Kratos Multiphysics (A.K.A Kratos) is a framework for building parallel multi-disciplinary simulation software. Modularity, extensibility and HPC are the main objectives. Kratos has BSD license and is written in C++ with extensive Python interface.
https://kratosmultiphysics.github.io/Kratos/
Other
1.02k stars 244 forks source link

[Core][MPI] `ParallelFillcommunicator` gets stuck when processing submodelparts if these submodelparts do not hold full information #12613

Closed loumalouomega closed 2 months ago

loumalouomega commented 2 months ago

Description

ParallelFillcommunicator gets stuck when processing submodelparts if these submodelparts do not hold full information.

@ddiezrod realised that the simulation was getting stuck after calling the SkinDetectionProcess and then the ParallelFillcommunicator.

After researching I concluded that the problem was that some information may be in the main model part but not in the submodelpart. This is specially true when we speak about detecting the skin in a mesh. In this case we may have cases where an element exist in a certain partition and the only existing node in that partition is not in the skin, therefore the ParallelFillcommunicator gets stuck as it tries to communicate between non-existing partitions for that submodelpart.

This happens because the ParallelFillcommunicator applies the same algorithm for the submodelparts than for the main model part, https://github.com/KratosMultiphysics/Kratos/blob/c29d1dcca94da8622bbab69d8bb8a4da5c5ea9a6/kratos/mpi/utilities/parallel_fill_communicator.cpp#L399. This could be avoided if just computing in main model part and then retrieve the information from this main model part. This a priori would be also faster because the communication info will be computed just once and then the submodelparts processed locally with the existing info.

Scope

To Reproduce

Run the following with 7 cores or more (in this case it crashes instead of getting stuck):

import os

import KratosMultiphysics
import KratosMultiphysics.KratosUnittest as KratosUnittest
import KratosMultiphysics.kratos_utilities as kratos_utils
from KratosMultiphysics.testing.utilities import ReadModelPart

def GetFilePath(fileName):
    return os.path.join(os.path.dirname(os.path.realpath(__file__)), fileName)

def RemoveFiles(mdpa_name):
    kratos_utils.DeleteFileIfExisting(mdpa_name + ".time")

class TestSkinDetectionProcess(KratosUnittest.TestCase):
    @classmethod
    def setUpClass(cls):
        KratosMultiphysics.Logger.GetDefaultOutput().SetSeverity(KratosMultiphysics.Logger.Severity.WARNING)
        cls.current_model = KratosMultiphysics.Model()
        cls.model_part = cls.current_model.CreateModelPart("Main")
        cls.model_part.ProcessInfo[KratosMultiphysics.DOMAIN_SIZE] = 3
        cls.mdpa_name = GetFilePath("auxiliar_files_for_python_unittest/mdpa_files/coarse_sphere")
        ReadModelPart(cls.mdpa_name, cls.model_part)

    @classmethod
    def tearDownClass(cls):
        RemoveFiles(cls.mdpa_name)

    def test_SkinDetectionProcess(self):
        # We set a flag in the already known node in the skin
        for node in self.model_part.GetSubModelPart("Skin_Part").Nodes:
            node.Set(KratosMultiphysics.ACTIVE, True)

        detect_skin = KratosMultiphysics.SkinDetectionProcess3D(self.model_part)
        detect_skin.Execute()

        for node in self.model_part.Nodes:
            self.assertEqual(node.Is(KratosMultiphysics.INTERFACE), node.Is(KratosMultiphysics.ACTIVE))

        # The data communicator
        data_comm = self.model_part.GetCommunicator().GetDataCommunicator()

        # Construct and execute the Parallel fill communicator
        if data_comm.IsDistributed():
            import KratosMultiphysics.mpi as KratosMPI
            parallel_fill_communicator = KratosMPI.ParallelFillCommunicator(self.model_part, data_comm)
            parallel_fill_communicator.Execute()

        # Check the number of conditions created
        if data_comm.IsDistributed():
            self.assertEqual(self.model_part.GetCommunicator().GlobalNumberOfConditions(), 128)
        else:
            self.assertEqual(self.model_part.NumberOfConditions(), 128)

if __name__ == '__main__':
    KratosMultiphysics.Logger.GetDefaultOutput().SetSeverity(KratosMultiphysics.Logger.Severity.WARNING)
    KratosUnittest.main()

Expected behavior

Not to get stuck.

Environment

loumalouomega commented 2 months ago

There is a case still failing that is fixed with the proposal design

loumalouomega commented 2 months ago

There is a case still failing that is fixed with the proposal design

False alarm, my PC is doing weird stuff