geotopmodel / geotop

GEOtop is a distributed model of the mass and energy balance of the hydrological cycle, which is applicable to simulations in continuum in small catchments. GEOtop deals with the effects of topography on the interaction between energy balance and hydrological cycle with peculiar solutions.
GNU General Public License v3.0
38 stars 29 forks source link

A bit of optimization for Richards3D #96

Closed asartori86 closed 5 years ago

asartori86 commented 5 years ago

I started to work on Richards3D and I added several scoped timers to the functions called inside Richards3D. The current v3.0 (configured with -DMATH_OPTIM=true on my laptop produce this output when the thest Muntatschini_ref_005 is run

+-------------------------------------------------------------------+------------+------------+
| Total CPU time elapsed since start                                |      86.9s |            |
|                                                                   |            |            |
| Section                                               | no. calls |  CPU time  | % of total |
+-------------------------------------------------------+-----------+------------+------------+
| BiCGSTAB_strict_lower_matrix_plus_identity_by_vector  |        28 |      0.25s |       0.3% |
| EnergyBalance                                         |        24 |     17.38s |      20.0% |
| PointEnergyBalance                                    |    143064 |     17.25s |      19.9% |
| Richards3D                                            |        24 |     56.30s |      64.8% |
| SolvePointEnergyBalance                               |    143064 |      8.19s |       9.4% |
| find_dfdH_3D                                          |        28 |      3.48s |       4.0% |
| find_f_3D                                             |        52 |     12.27s |      14.1% |
| find_matrix_K_3D                                      |        24 |     36.78s |      42.3% |
| get_all_input                                         |         1 |      0.36s |       0.4% |
| product_matrix_using_lower_part_by_vector_plus_vector |        52 |      0.07s |      0.00% |
| product_using_only_strict_lower_diagonal_part         |        52 |      0.06s |      0.00% |
| time_loop                                             |         1 |     86.54s |      99.6% |
| tridiag                                               |        56 |      0.07s |      0.00% |
| tridiag2                                              |    476659 |      0.13s |       0.1% |
| water_balance                                         |        24 |     56.34s |      64.8% |
| write_output                                          |        24 |      0.54s |       0.6% |
+-------------------------------------------------------+-----------+------------+------------+

As you can see, the two most expensive functions called by Richards3D are find_matrix_K_3D and find_f_3D. The latter is well suited for a parallelization through openMP. After adding the pragma, and configuring with -DWITH_OMP=true, the output is the following (look find_f_3D)

+-------------------------------------------------------------------+------------+------------+
| Total CPU time elapsed since start                                |      78.2s |            |
|                                                                   |            |            |
| Section                                               | no. calls |  CPU time  | % of total |
+-------------------------------------------------------+-----------+------------+------------+
| BiCGSTAB_strict_lower_matrix_plus_identity_by_vector  |        28 |      0.25s |       0.3% |
| EnergyBalance                                         |        24 |     17.24s |      22.0% |
| PointEnergyBalance                                    |    143064 |     17.11s |      21.9% |
| Richards3D                                            |        24 |     47.55s |      60.8% |
| SolvePointEnergyBalance                               |    143064 |      8.19s |      10.5% |
| find_dfdH_3D                                          |        28 |      3.59s |       4.6% |
| find_f_3D                                             |        52 |      3.25s |       4.2% |
| find_matrix_K_3D                                      |        24 |     36.82s |      47.1% |
| get_all_input                                         |         1 |      0.37s |       0.5% |
| product_matrix_using_lower_part_by_vector_plus_vector |        52 |      0.10s |       0.1% |
| product_using_only_strict_lower_diagonal_part         |        52 |      0.10s |       0.1% |
| time_loop                                             |         1 |     77.84s |      99.5% |
| tridiag                                               |        56 |      0.07s |      0.00% |
| tridiag2                                              |    476659 |      0.13s |       0.2% |
| water_balance                                         |        24 |     47.59s |      60.9% |
| write_output                                          |        24 |      0.59s |       0.8% |
+-------------------------------------------------------+-----------+------------+------------+

lookin at the functions that are called I realized that many functions that were written in a .cc file should be declared inline inside an header since they are pretty small. With the inlining we gained few percent.

# clang-6
+---------------------------------------------+------------+------------+
| Total CPU time elapsed since start          |      76.2s |            |
|                                             |            |            |
| Section                         | no. calls |  CPU time  | % of total |
+---------------------------------+-----------+------------+------------+
| EnergyBalance                   |        24 |     16.22s |      21.3% |
| PointEnergyBalance              |    143064 |     16.09s |      21.1% |
| Richards3D                      |        24 |     46.69s |      61.3% |
| SolvePointEnergyBalance         |    143064 |      7.19s |       9.4% |
| find_matrix_K_3D                |        24 |     36.18s |      47.5% |
| get_all_input                   |         1 |      0.33s |       0.4% |
| water_balance                   |        24 |     46.75s |      61.3% |
| write_output                    |        24 |      0.50s |       0.7% |
+---------------------------------+-----------+------------+------------+

# gcc-8.2.0
+---------------------------------------------+------------+------------+
| Total CPU time elapsed since start          |      76.7s |            |
|                                             |            |            |
| Section                         | no. calls |  CPU time  | % of total |
+---------------------------------+-----------+------------+------------+
| EnergyBalance                   |        24 |     16.70s |      21.8% |
| PointEnergyBalance              |    143064 |     16.57s |      21.6% |
| Richards3D                      |        24 |     46.80s |      61.0% |
| SolvePointEnergyBalance         |    143064 |      7.80s |      10.2% |
| find_matrix_K_3D                |        24 |     36.24s |      47.3% |
| get_all_input                   |         1 |      0.37s |       0.5% |
| water_balance                   |        24 |     46.84s |      61.1% |
| write_output                    |        24 |      0.58s |       0.8% |
+---------------------------------+-----------+------------+------------+
asartori86 commented 5 years ago

I tried several strategies to speed up find_matrix_K_3D without success. @yakopuz, there is the variable cnt which is updated several times during an iteration of the for loop. It is not clear to me if at beginning of each iteration cnt = C * i, i.e. cnt is incremented by the same amount per iteration. If so, we may parallelize also this function quite straightforwardly

asartori86 commented 5 years ago

@yakopuz @ecor @cozzini @ElisaBortoli, I see that the cnt variable is basically updated 3 times per iteration, but sometimes just 2. Do you know why? can we predict for which iteration the previous one updated cnt just twice instead of three times?

I am talking about the find_matrix_K_3D function

yakopuz commented 5 years ago

@yakopuz @ecor @cozzini @ElisaBortoli, I see that the cnt variable is basically updated 3 times per iteration, but sometimes just 2. Do you know why? can we predict for which iteration the previous one updated cnt just twice instead of three times?

@asartori86 This is apart that I know little ... I see only:

/* cnt is the counter of Li Lp Lx (lower diagonal without diagonal) /

I see that for different situations (vertical flow, surface flow, and so on) there are loops in count. @ecor do you understand better how find_matrix_K_3D works?

asartori86 commented 5 years ago

thanks @yakopuz, and do you know if EnergyBalance and WaterBalance could be overlapped? or they modify the same variables and they have to run sequentially?

asartori86 commented 5 years ago

/* cnt is the counter of Li Lp Lx (lower diagonal without diagonal) /

I don't know what Li, Lp, and Lx are but only Lx are written inside the function

cnt++;
(*Lx)[cnt] = ... ;
yakopuz commented 5 years ago

Alberto,

The physics should work in this way: The model solves before the energy balance, then the water balance.

However, there is some coupling among the two. When you solve the energy balance you calculate the evapotranspiration on the basis of the available water, so you need to subtract this water from the water balance. The functions that are doing this are update_soil_land and update_soil_channel in energy.balance.cc. If the mass budget calculation is not satisfied, this could trigger some recalculation of energy balance and water balance.

I do not know the exact details of the implementation, however I would start to look on time_loop in geotop.cc. I would look how time stepping is implemented. From a quick look, I understand that, after some initializations, at line 311 you run first the energy balance, then if this has success en == 0 you run the water balance, any of them do not have success line 329 if (en != 0 || wt != 0), then you try to calculate them again with a shorter time steps. Notice that you can have different error values for en and wt, depending on what happens.

To better understand the code I would recommend (at least it would be very helpful also for me to better assist you and Elisa):

Giacomo

From: Alberto Sartori notifications@github.com Sent: Tuesday, 27 November, 2018 09:18 To: geotopmodel/geotop geotop@noreply.github.com Cc: Bertoldi Giacomo giacomo.bertoldi@eurac.edu; Mention mention@noreply.github.com Subject: Re: [geotopmodel/geotop] A bit of optimization for Richards3D (#96)

thanks @yakopuzhttps://github.com/yakopuz, and do you know if EnergyBalance and WaterBalance could be overlapped? or they modify the same variables and they have to run sequentially? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/geotopmodel/geotop/pull/96, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AP8hXFHWw50dDicTPKelQcEl_7Lo-7vXks5uzPVCgaJpZM4YxlMD.[https://github.com/notifications/beacon/AP8hXA9lP_4Xr0Is-7QEpgpf1aOCv66iks5uzPVCgaJpZM4YxlMD.gif]

yakopuz commented 5 years ago

/* cnt is the counter of Li Lp Lx (lower diagonal without diagonal) /

I don't know what Li, Lp, and Lx are but only Lx are written inside the function

cnt++;
(*Lx)[cnt] = ... ;

I see are defined in indices.cc as in void cont_nonzero_values_matrix3. They seems to be the indices of the cells to which another cell is communicating, the cell below or above or next to the calculation cell. Depending on the process, they could be different. I suppose for example for surface flow the next cell below, but for subsurface flow the the two next cell and so on ... If you are on the border, you mighth have less cells communicating ....

Understanding this topology could be very important for a effort of parallelization MPI - style.

ecor commented 5 years ago

Hi @yakopuz https://github.com/yakopuz @ecor https://github.com/ecor @cozzini https://github.com/cozzini @ElisaBortoli https://github.com/ElisaBortoli, @asartori86 https://github.com/asartori86 . find_K_3D_matrix looks to calculate a term, the Lx pointer (the 4th pointer) , which is function of hydraulic conductivity and is part of the matrix that is solved in the core ODE system within the Newton-like iterator.

Each term of this quantity Lx is refererred to each cell of the Richards3D integration domain, cnt is a counter .

In fact, it is called in case of update=TRUE. So I can see from the code.

cheers Emanuele

Il giorno lun 26 nov 2018 alle ore 18:07 Giacomo Bertoldi < notifications@github.com> ha scritto:

@yakopuz https://github.com/yakopuz @ecor https://github.com/ecor @cozzini https://github.com/cozzini @ElisaBortoli https://github.com/ElisaBortoli, I see that the cnt variable is basically updated 3 times per iteration, but sometimes just 2. Do you know why? can we predict for which iteration the previous one updated cnt just twice instead of three times?

@asartori86 https://github.com/asartori86 This is apart that I know little ... I see only:

/* cnt is the counter of Li Lp Lx (lower diagonal without diagonal) /

I see that for different situations (vertical flow, surface flow, and so on) there are loops in count. @ecor https://github.com/ecor do you understand better how find_matrix_K_3D works?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/geotopmodel/geotop/pull/96#issuecomment-441718275, or mute the thread https://github.com/notifications/unsubscribe-auth/AEQ3gSY3Eg8Yk1-cmbPXeWuNMmKQkqxIks5uzB_rgaJpZM4YxlMD .

-- Emanuele Cordano, PhD Environmental Engineer / Ingegnere per l' Ambiente e il territorio nr. 3587 (Albo A - Provincia di Trento) cell: +39 3282818564 email: emanuele.cordano@gmail.com,emanuele.cordano@rendena100.eu, emanuele.cordano@eurac.edu PEC: emanuele.cordano@ingpec.eu URL: www.rendena100.eu LinkedIn: https://www.linkedin.com/in/emanuele-cordano-31995333 GitHub: https://github.com/ecor

ecor commented 5 years ago

The Lx term might be referenced to each border between two cells. So the counter is defined per each border between 2 cells . You should check but it makes sense. I remember when I wrote a per for a 2D similar problem , it looks like the tensor T in https://agupubs.onlinelibrary.wiley.com/doi/full/10.1002/wrcr.20072 , see tensor T eq 20 ans 21. In GEOtop it is more complicated (it is 3D) but this is similar. cheers @ecor

Il giorno mar 27 nov 2018 alle ore 23:19 Emanuele Cordano < emanuele.cordano@gmail.com> ha scritto:

Hi @yakopuz https://github.com/yakopuz @ecor https://github.com/ecor @cozzini https://github.com/cozzini @ElisaBortoli https://github.com/ElisaBortoli, @asartori86 https://github.com/asartori86 . find_K_3D_matrix looks to calculate a term, the Lx pointer (the 4th pointer) , which is function of hydraulic conductivity and is part of the matrix that is solved in the core ODE system within the Newton-like iterator.

Each term of this quantity Lx is refererred to each cell of the Richards3D integration domain, cnt is a counter .

In fact, it is called in case of update=TRUE. So I can see from the code.

cheers Emanuele

Il giorno lun 26 nov 2018 alle ore 18:07 Giacomo Bertoldi < notifications@github.com> ha scritto:

@yakopuz https://github.com/yakopuz @ecor https://github.com/ecor @cozzini https://github.com/cozzini @ElisaBortoli https://github.com/ElisaBortoli, I see that the cnt variable is basically updated 3 times per iteration, but sometimes just 2. Do you know why? can we predict for which iteration the previous one updated cnt just twice instead of three times?

@asartori86 https://github.com/asartori86 This is apart that I know little ... I see only:

/* cnt is the counter of Li Lp Lx (lower diagonal without diagonal) /

I see that for different situations (vertical flow, surface flow, and so on) there are loops in count. @ecor https://github.com/ecor do you understand better how find_matrix_K_3D works?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/geotopmodel/geotop/pull/96#issuecomment-441718275, or mute the thread https://github.com/notifications/unsubscribe-auth/AEQ3gSY3Eg8Yk1-cmbPXeWuNMmKQkqxIks5uzB_rgaJpZM4YxlMD .

-- Emanuele Cordano, PhD Environmental Engineer / Ingegnere per l' Ambiente e il territorio nr. 3587 (Albo A - Provincia di Trento) cell: +39 3282818564 email: emanuele.cordano@gmail.com,emanuele.cordano@rendena100.eu, emanuele.cordano@eurac.edu PEC: emanuele.cordano@ingpec.eu URL: www.rendena100.eu LinkedIn: https://www.linkedin.com/in/emanuele-cordano-31995333 GitHub: https://github.com/ecor

-- Emanuele Cordano, PhD Environmental Engineer / Ingegnere per l' Ambiente e il territorio nr. 3587 (Albo A - Provincia di Trento) cell: +39 3282818564 email: emanuele.cordano@gmail.com,emanuele.cordano@rendena100.eu, emanuele.cordano@eurac.edu PEC: emanuele.cordano@ingpec.eu URL: www.rendena100.eu LinkedIn: https://www.linkedin.com/in/emanuele-cordano-31995333 GitHub: https://github.com/ecor