preparing for multiple analysis to be run in paralle (UQ, optimization...etc)

RiccardoRossi commented 5 years ago

Within the ExaQUte project, we are developing a mechanism to launch (very) many simulations and later to gather the results.

To this end, we need a mechanism to read ONCE the input files and then to pass them "in memory" to each of the subsequent analysis which will be launched in a different memory space.. The scenario we'll need to deal with is in extreme simplification that of a "Montecarlo" approach, meaning that the flow of the program will be as follows:

1 - Read mdpa and parameters (for example with a process that defines a relevant BC depending on a stochastic variable) 2 - launch many (many!) instances of simulation by passing in memory the data (it is crucial to avoid the bottleneck of accessing to disk!!) 3 - gather the results and do some statistical treatment over them

This issue is about agreeing about a method for doing this effectively. My idea is quite easy: Analyses depend on Model and Parameters. Each of those objects can be serialized and sent over the network. It can then be deserialized in a different memory space and used there as needed.

My proposal is hence to proceed as follows:

1 - create a "OnlyImportAnalysisStage" which does no simulation but fills the Model and the ProjectParameters (that is, calls the solver.ImportModelPart) 2 - serialize Model and ProjectParameters 3 - Open a new thread (this is done automatically by the tools we are employing, called PyCOMPSs and HyperLoom) and pass in the construction the serialized Model and ProjectParameters 4 - launch a new analysis but assuming that Model and ProjectParameters are taken from the deserialization of the objects we are launching 5 - gather the results should not be an issue.

All of this can be implemented in user space, without anything in kratos, provided that on step 4 we can tell to use an existing model (and modelparts therein) instead of Importing them again (same for ProjectParameters). this should be possible by simply setting

   "input_type": "use_input_model_part",

and using the modelparts as already present in the serialized model.

Well...the essence of this issue is to ask if anyone sees issues with this model

philbucher commented 5 years ago

I think you should be able to do everything you want with what we already have in Kratos :+1:

loumalouomega commented 5 years ago

I think as @philbucher that with the current implementation there is not need for change. The analysis is already divided in many steps, and one of them is the import of the model. Maybe we can discrtize more or adapta better to the MC needs

adityaghantasala commented 5 years ago

A few points : It is mentioned that the Model and ProjectParameters will be serialized and sent to the process in memory regarding this :

First why in memory? The reason is this will require over the network communication which require a good, robust and stable infrastructure/framework to do this( Model can be huge or ? ). Also having a framework to do this on a general hardware will be required.
Second why should the Model be serialized ? I mean, one can change the ProjectParameters with the BC depending on stochastic variable and dump the JSON file and the individual processes spawned by PyCOMPSs or HyperLoom can just read the the mdpa and the its specific ProjectParameters to run the simulation. Here I feel that serialization and de-serialization are not necessary. If the focus is about reading the MDPA only ONCE, can you comment of the cost of de-serialization and reading ?

Lets say the transfer of serialized Model and ProjectParameters is being done then :

Will the transfer of the serialized objects be done on c++ level or Python level ?
If at Python level does the functionality already exist ?

Gathering the results :

Will this require that a serialization of the ModelPart or Model together with results ? So that this can be sent back to the control script ?

RiccardoRossi commented 5 years ago

Hi @adityaghantasala there are several reasons for which it is good to have model in memory (even if in serialized form). Let me list just a few

1 - access to disk is inherently slow. Getting better with SSDs, and better again with persistent storage, but still orders of magnitude slower than memory access. Also from our experience within numexas, access to clusters storage scales up to a point and then falls out of a cliff. It is particularly problematic to have thousands of open files. 2 - the disk can simply be inaccessible. Think of grid computing if you like ... you have many computers but no access to a centralized storage 3 - imagine you have a multistage analysis, and you want to change the number of processors from stage 1 to stage2. Only option you have is to create new processes and transfer your data to them. This requires the mechanism that i am describing 4 - the cost of sending the model around is much like the cost of transferring the data from the node that has the disk to the node that does the computations. so no advantage in reading from file here

Regarding the mechanism of transfer by serialization, one can first serialize objects and send them around by mpi, or can rely on libraries to do that (or even tcp/ip) under the hood). This second is the use case of ExaQUte with pyCOMPSs and HyperLoom.

Note also that a typical usecase will be to have a project parameter, modify it in master, package and send it to launch new processes. So Parameters will definitely need to be serialized.

regarding performance, just today i pushed a first test implementation (see #3233 ). Here is some benchmark (only reading, not writing results). The case has around 1M elements, and my computer has no SSD (although the disk is buffered). Note also that i believe there is still some space for optimization.

riccardo ~/.../examples/tube_3d_simple.gid python3 benchmark_serialization_and_pickling.py | / |
' / | _` | _| \ | . \ | ( | | ( |_ \ |__| _,|__|_/ __/ Multi-Physics 6.0.0-5246aab260-RelWithDebInfo Importing KratosFluidDynamicsApplication Initializing KratosFluidDynamicsApplication... ModelPartIO: [Reading Nodes : 178364 nodes read] ModelPartIO: [Reading Elements : 1006972 elements read] [Type: Element3D4N] ModelPartIO: [Reading Conditions : 3198 conditions read] [Type: WallCondition3D3N] ModelPartIO: [Reading Conditions : 3198 conditions read] [Type: WallCondition3D3N] ModelPartIO: [Reading Conditions : 24440 conditions read] [Type: WallCondition3D3N] ModelPartIO: [Total Lines Read : 2448313] reading time = 8.025572299957275 Kratos saving to Serializer time = 2.7286128997802734 pickling dumps time = 0.6254072189331055 pickling loads time = 0.5501902103424072 Kratos loading from Serializer time = 3.269657611846924

finally I believe that HDF5 output is a good solution ... so i intend to eventually adopt it!

pooyan-dadvand commented 5 years ago

I completely agree about on the memory use case and for huge models is even more useful.

Two comments:

Isn't better to make MPIRemoteAnalysisStage? taking the rank of the remote process which should handle it?
Is there anyway to do it compatible with CoSimulation?

RiccardoRossi commented 5 years ago

@pooyan-dadvand my proposal is just initial...we'll polish it with experience (for example by the MPIRemoteStage you define)

the fact is that schedulers (pyCOMPSs and HyperLoom but also others) express dependencies by a mechanism that is similar to the one of the futures.

in terms of future this would be equivalent to

  serialized_data = launch( ModelImportStage )
  res1 = launch( MySpecialAnalysys(serialized_data.get()) 
  res2 = launch( MySpecialAnalysys(serialized_data.get()) 
  ....
  res_n = launch( MySpecialAnalysys(serialized_data.get()) 

  final_result = postprocess( [ res1.get() .... res_n.get() ])

here res_1 ... res_n can be all launched in parallel ONCE serialized_data is ready

final_result will be computed once all of res_i are finished

Regarding the co-simulation question i am not completely clear of what would be the use case

pooyan-dadvand commented 5 years ago

Regarding the co-simulation question i am not completely clear of what would be the use case

Conceptually, co-simulation does the same between different solvers and you are doing between different machines. Ensuring the compatibility would ease the remote co-simulations (eventually to a cloud service)

Also consider that the part of the post process you are mentioning can be very complicated containing mapping the results to some domains and changing the inputs of other solvers (what a co-simulator also does). Here again I would go for a solution compatible with our co-simulation and eventually rely on it. In this aspect the co-simulation would complete the lacking abilities of a standard scheduler for our cases which is not only get a number from the remote and be happy with that...

RiccardoRossi commented 5 years ago

I expressed myself badly. I know what is co-simulation.

I just don't know what would be the role of serialization in this specific context

pooyan-dadvand commented 5 years ago

me too...

I will commenting it to you in person...

KratosMultiphysics / Kratos

preparing for multiple analysis to be run in paralle (UQ, optimization...etc) #3216