Kratos Multiphysics (A.K.A Kratos) is a framework for building parallel multi-disciplinary simulation software. Modularity, extensibility and HPC are the main objectives. Kratos has BSD license and is written in C++ with extensive Python interface.
ParallelFillcommunicator gets stuck when processing submodelparts if these submodelparts do not hold full information.
@ddiezrod realised that the simulation was getting stuck after calling the SkinDetectionProcess and then the ParallelFillcommunicator.
After researching I concluded that the problem was that some information may be in the main model part but not in the submodelpart. This is specially true when we speak about detecting the skin in a mesh. In this case we may have cases where an element exist in a certain partition and the only existing node in that partition is not in the skin, therefore the ParallelFillcommunicator gets stuck as it tries to communicate between non-existing partitions for that submodelpart.
Run the following with 7 cores or more (in this case it crashes instead of getting stuck):
import os
import KratosMultiphysics
import KratosMultiphysics.KratosUnittest as KratosUnittest
import KratosMultiphysics.kratos_utilities as kratos_utils
from KratosMultiphysics.testing.utilities import ReadModelPart
def GetFilePath(fileName):
return os.path.join(os.path.dirname(os.path.realpath(__file__)), fileName)
def RemoveFiles(mdpa_name):
kratos_utils.DeleteFileIfExisting(mdpa_name + ".time")
class TestSkinDetectionProcess(KratosUnittest.TestCase):
@classmethod
def setUpClass(cls):
KratosMultiphysics.Logger.GetDefaultOutput().SetSeverity(KratosMultiphysics.Logger.Severity.WARNING)
cls.current_model = KratosMultiphysics.Model()
cls.model_part = cls.current_model.CreateModelPart("Main")
cls.model_part.ProcessInfo[KratosMultiphysics.DOMAIN_SIZE] = 3
cls.mdpa_name = GetFilePath("auxiliar_files_for_python_unittest/mdpa_files/coarse_sphere")
ReadModelPart(cls.mdpa_name, cls.model_part)
@classmethod
def tearDownClass(cls):
RemoveFiles(cls.mdpa_name)
def test_SkinDetectionProcess(self):
# We set a flag in the already known node in the skin
for node in self.model_part.GetSubModelPart("Skin_Part").Nodes:
node.Set(KratosMultiphysics.ACTIVE, True)
detect_skin = KratosMultiphysics.SkinDetectionProcess3D(self.model_part)
detect_skin.Execute()
for node in self.model_part.Nodes:
self.assertEqual(node.Is(KratosMultiphysics.INTERFACE), node.Is(KratosMultiphysics.ACTIVE))
# The data communicator
data_comm = self.model_part.GetCommunicator().GetDataCommunicator()
# Construct and execute the Parallel fill communicator
if data_comm.IsDistributed():
import KratosMultiphysics.mpi as KratosMPI
parallel_fill_communicator = KratosMPI.ParallelFillCommunicator(self.model_part, data_comm)
parallel_fill_communicator.Execute()
# Check the number of conditions created
if data_comm.IsDistributed():
self.assertEqual(self.model_part.GetCommunicator().GlobalNumberOfConditions(), 128)
else:
self.assertEqual(self.model_part.NumberOfConditions(), 128)
if __name__ == '__main__':
KratosMultiphysics.Logger.GetDefaultOutput().SetSeverity(KratosMultiphysics.Logger.Severity.WARNING)
KratosUnittest.main()
Description
ParallelFillcommunicator
gets stuck when processing submodelparts if these submodelparts do not hold full information.@ddiezrod realised that the simulation was getting stuck after calling the
SkinDetectionProcess
and then theParallelFillcommunicator
.After researching I concluded that the problem was that some information may be in the main model part but not in the submodelpart. This is specially true when we speak about detecting the skin in a mesh. In this case we may have cases where an element exist in a certain partition and the only existing node in that partition is not in the skin, therefore the
ParallelFillcommunicator
gets stuck as it tries to communicate between non-existing partitions for that submodelpart.This happens because the
ParallelFillcommunicator
applies the same algorithm for the submodelparts than for the main model part, https://github.com/KratosMultiphysics/Kratos/blob/c29d1dcca94da8622bbab69d8bb8a4da5c5ea9a6/kratos/mpi/utilities/parallel_fill_communicator.cpp#L399. This could be avoided if just computing in main model part and then retrieve the information from this main model part. This a priori would be also faster because the communication info will be computed just once and then the submodelparts processed locally with the existing info.Scope
To Reproduce
Run the following with 7 cores or more (in this case it crashes instead of getting stuck):
Expected behavior
Not to get stuck.
Environment