Too many I/O operations slowing down model execution

tpilz commented 6 years ago

Hi,

I created a complex model engine with several spatial discretisation levels resulting in different object groups. When applying this engine to a catchment, I got an accordingly complex model setup with more than 9,000 objects and 110,000 object links. When reading the model input, this results in many reading operations during model initialisation. In my case this is espacially true for the step of "Initializing individual parameter functions" as many different objects need to access the same data (e.g. because they are objects of the same soil type requiring the same soil parameters). On my desktop PC this not a big deal, requiring only about 1-2 seconds of time, but when running the same configuration on a high performance cluster this step may take up to several minutes, considerably slowing down my application as I am doing a Monte Carlo simulation + several additional model re-initialisations at each MC run for warmup of model states.

I had a look into the ECHSE core and figured out the reading of parameter files is done in a loop over all the objects (see core/echse_coreFunct_main.cpp lines 302 ff.). Is it somehow possible, for instance, to alter function init_paramsFun() in a way that it checks if a certain file has already been read and assign the value from internal memory instead of reading the file again? I had a quick look, but I think I am not able to do it because there are too many other dependencies where I get lost. I think my C++ skills are too limited to do it.

Or is there even some other solution?

Cheers, Tobias

echse commented 6 years ago

Hi Tobias, your suggested solution should be feasible in general but I suggest to try simpler solutions first.

(1) You might introduce additional classes. All objects of a class can share a parameter function (see "group-specific parameter functions" in the documentation).

(2) You might simplify your functions (less records in the respective files) to speed up reading.

(3) Maybe you could try different hardware configurations. Is the reading process faster on a SSD?

If none of these alternatives is suitable I'll have a look at the code. But my time is very limited.+

Cheers, David

tpilz commented 6 years ago

Hi David,

thanks for your reply and suggestions.

(1) I think using a shared parameter function won't work in my case. Let me explain: My problem is the number of soil-vegetation components (SVCs), representing an object group in my engine. I have many objects in this group within my setup (~45,000), not resulting from a high number of soil-vegetation type combinations but from the combination with higher order units in the spatial hierarchy (subbasin -> so-called landscape unit -> so-called terrain component -> SVC). That means, many SVC objects share the same parameters but have different boundary conditions imposed by the objects of higher order in the spatial hierarchy. The parameter functions contain soil parameters for a representative soil profile, i.e. the argument is the soil horizon position in the profile and the value the respective parameters of the horizon. Many SVC objects need to access the same parameter function because they belong to the same soil type. To substitute the object-specific parameter functions by shared parameter functions, one would not only need the information of soil horizon but also about the soil type, i.e. 2 arguments for the parameter function. At the moment I don't see a solution, how to alter the engine implementation to be more efficient in terms of initialisation without messing up the whole configuration. Maybe I could introduce a soil horizon object class but that would (in the current setup) result in about 45,000 times soil horizons more objects. Furthermore, it would cause a lot of other problems as my current approach for testing different ODE solvers to integrate the water balance equation for each soil profile, taking into account interactions between soil horizons, would not work anymore.

(2) My actual look-up tables are already quite simple and contain only very few numbers of lines. The problem is that I have too many objects requiring access to these (often the same) files.

(3) My desktop PC in the office still got an old HDD, but this works fine. The problems are the clusters I am working on. I tried two different ones and both use fast SSDs to access the data nodes. The access times are very different on the two clusters and vary in time and the compute node used. I guess it depends on the current usage of the cluster. To test this, I created a little test program, simply reading lots of data in a loop, also compiled with different GCC versions. On both clusters generally much more time is needed for the reading operations than on my desktop PC with HDD.

I understand that you don't have much time for the ECHSE anymore. Do you think it would be very complicated? Maybe there still is a simpler solution I didn't recognise so far, but it took my lots of time to build the engine and I already tried different approaches but I think, considering my overall goal, this is the only feasible.

echse commented 6 years ago

Hi again, I'm not sure how you actually use the cluster.

Do you run a single instance of the program in multi-thread mode (case 1)? Or do you run another instance of the program on each node (case 2)?

Generally, it seems that all (parallel) instances read from the same set of files. It might be that this results in a sequential execution, i.e. the different instances read the files one after another (i.e. not at the same time). If you have case 2 (see above) this would mean that you should also to distribute the data over the nodes so that each instance of the program is accompanied by its own set of files. Is that an option?

If you are confronted with case 1: Could you turn it into a case 2 problem?

David

tpilz commented 6 years ago

Within the MC simulation framework, I run multiple instances of the model, each in single thread mode (I guess this what you meant with case 2). Indeed, therein the different instances use the same data sources to save memory. But I think this is not the actual cause of the problem as even when I run just a single instance of the model or when I use my simple test programm (i.e. reading dummy data from a file in a sequential loop), the problem of significantly longer processing time on the cluster compared to runs on my desktop PC persists.

echse commented 6 years ago

Based on your info, the issue can be summarized as follows (no need to consider a multi-CPU case):

"A program that reads a large amount of data from input files runs significantly faster on your local machine than on (a single node of) a cluster."

IMO, this means that the primary problem is not the design of the echse software. I rather suppose that the difference in computation times reflects (a) differences in the efficiency of data transfer (disk - RAM - CPU) on the two systems and/or (b) differences in other hardware-related properties, e.g. CPU, storage systems, cache.

Sure, an adaptation of the echse software that results in less (or faster) reading of files is likely to reduce execution times on both, your local machine and a cluster. But I don't expect much benefit from that since the actual bottleneck seems to be somewhere else. I'd rather invest time in identifying the true reason for the slow execution of a single program instance on the cluster. Maybe the cluster's maintainer or an IT expert can help.

David

echse / echse_generic

Too many I/O operations slowing down model execution #7