EDmodel / ED2

Ecosystem Demography Model
78 stars 112 forks source link

SMP Release #30

Open rgknox opened 9 years ago

rgknox commented 9 years ago

Hi All,

I put the Shared Memory Parallelism commits on the master. This will allow for the splitting of radiation scattering, photosynthesis and thermodynamics of different patches to different CPU cores.

This has been tested using RK4 and Hybrid integration This has had limited testing on gridded runs This has had no testing on coupled runs (but I don't suspect any breakage).

If you don't want to use shared memory, just keep doing what you have done in the past and nothing should change.

If you do want to use it, follow these steps for a single polygon run:

1) compile code with shared memory directives, if you are using OpenMP, the flag is '-fopenmp' 2) (optional) increase your stack size. On linux: "ulimit -s unlimited" 3) set run-time environment variables. If you are using OpenMP, the key variable is OMP_NUM_THREADS. This defines how many shared memory cores will be used. On linux: "export OMP_NUM_THREADS=X" where X is the number of cores you wish to use. REMEMBER: These cores must share RAM, so you are limited by the number of cores that are on one node. 4) Execute the simulation as you would normally.

This release is experimental for the time being. If you have trouble or crashes or poor reproducability of previous work, revert to commit 2a5d68ebb291581c932a442e2701e553b24b1170

ie:

git checkout 2a5d68ebb291581c932a442e2701e553b24b1170

apourmok commented 9 years ago

Hi Ryan, I pulled your latest changes from main line. I am able to compile and run the model on my main branch but in my management branch that I am currently working, I can't compile the model since the pull. Here is the error I get, any thoughts?

Error: Unclassifiable statement at (1) canopy_struct_dynamics.f90:392.53:

     csite%rough(ipa) = snow_rough * snowfac_can
                                                 1

Warning: Nonconforming tab character at (1) canopy_struct_dynamics.f90:591.53:

     csite%rough(ipa) = snow_rough * snowfac_can
                                                 1

Warning: Nonconforming tab character at (1) canopy_struct_dynamics.f90:600.85:

urf_rough = soil_rough * (1.0 - snowfac_can) & 1 Warning: Nonconforming tab character at (1) canopy_struct_dynamics.f90:779.9:

     lad(:) = 0.0
     1

Error: Unclassifiable statement at (1) canopy_struct_dynamics.f90:821.27:

              lad(k) = lad(k) + ladcohort
                       1

Error: Statement function at (1) is recursive canopy_struct_dynamics.f90:832.35:

              lad(kapartial) = lad(kapartial)
                               1

Error: Statement function at (1) is recursive canopy_struct_dynamics.f90:836.35:

              lad(kapartial) = lad(kapartial)
                               1

Error: Statement function at (1) is recursive canopy_struct_dynamics.f90:839.35:

              lad(kzpartial) = lad(kzpartial)
                               1

Error: Statement function at (1) is recursive canopy_struct_dynamics.f90:893.27:

              lad(k) = lad(k) + ladcohort
                       1

Error: Statement function at (1) is recursive canopy_struct_dynamics.f90:904.35:

              lad(kapartial) = lad(kapartial)
                               1

Error: Statement function at (1) is recursive canopy_struct_dynamics.f90:908.35:

              lad(kapartial) = lad(kapartial)
                               1

Error: Statement function at (1) is recursive canopy_struct_dynamics.f90:911.35:

              lad(kzpartial) = lad(kzpartial)
                               1

Error: Statement function at (1) is recursive canopy_struct_dynamics.f90:933.12:

        cdrag   (:) = cdrag1 + 0.5 * cdrag2
        1

Error: Unclassifiable statement at (1) canopy_struct_dynamics.f90:949.67:

           cdrag   (k)  = cdrag1 + cdrag2 / (1.0 + exp(c3_lad))
                                                               1

Error: Unexpected STATEMENT FUNCTION statement at (1) canopy_struct_dynamics.f90:950.32:

           pshelter(k)  = 1.
                            1

Error: Unexpected STATEMENT FUNCTION statement at (1) canopy_struct_dynamics.f90:952.47:

           cumldrag(k)  = ldga_bk + lyrhalf
                                           1

Error: Unexpected STATEMENT FUNCTION statement at (1) canopy_struct_dynamics.f90:958.17:

        cdrag   (:) = cdrag0
             1

Error: 'cdrag' at (1) is not a variable canopy_struct_dynamics.f90:971.53:

           pshelter(k)  = 1. + alpha_m97 * lad(k)
                                                 1

Error: Unexpected STATEMENT FUNCTION statement at (1) canopy_struct_dynamics.f90:973.47:

           cumldrag(k)  = ldga_bk + lyrhalf
                                           1

Error: Unexpected STATEMENT FUNCTION statement at (1) canopy_struct_dynamics.f90:1049.61:

        windlyr(k) = max(ugbmin, uh * exp(- nn * nddfun))
                                                         1

Error: Unexpected STATEMENT FUNCTION statement at (1) canopy_struct_dynamics.f90:1197.91:

                    ,csite%veg_displace(ipa),zzmid(k),csite%rough(ipa))
                                                                       1

Error: Unexpected STATEMENT FUNCTION statement at (1) canopy_struct_dynamics.f90:1339.7:

end associate
   1

Error: Expecting END SUBROUTINE statement at (1) canopy_struct_dynamics.f90:1588.6:

  associate(                               &
  1

Error: Unclassifiable statement at (1) canopy_struct_dynamics.f90:2063.9:

     lad8(:) = 0.d0
     1

Error: Unclassifiable statement at (1) canopy_struct_dynamics.f90:2106.28:

              lad8(k) = lad8(k) + ladcohort
                        1

Error: Statement function at (1) is recursive canopy_struct_dynamics.f90:2117.36:

              lad8(kapartial) = lad8(kapartial)
                                1

Error: Statement function at (1) is recursive canopy_struct_dynamics.f90:2121.36:

              lad8(kapartial) = lad8(kapartial)
                                1

Error: Statement function at (1) is recursive Fatal Error: Error count reached limit of 25. make[1]: * [canopy_struct_dynamics.o] Error 1 make[1]: Leaving directory `/usr2/postdoc/apourmok/ED2-1/ED/build/bin' make: * [all] Error 2

On Thu, Mar 5, 2015 at 8:10 PM, Ryan Knox notifications@github.com wrote:

Hi All,

I put the Shared Memory Parallelism commits on the master. This will allow for the splitting of radiation scattering, photosynthesis and thermodynamics of different patches to different CPU cores.

This has been tested using RK4 and Hybrid integration This has had limited testing on gridded runs This has had no testing on coupled runs (but I don't suspect any breakage).

If you don't want to use shared memory, just keep doing what you have done in the past and nothing should change.

If you do want to use it, follow these steps for a single polygon run:

1) compile code with shared memory directives, if you are using OpenMP, the flag is '-fopenmp' 2) (optional) increase your stack size. On linux: "ulimit -s unlimited" 3) set run-time environment variables. If you are using OpenMP, the key variable is OMP_NUM_THREADS. This defines how many shared memory cores will be used. On linux: "export OMP_NUM_THREADS=X" where X is the number of cores you wish to use. REMEMBER: These cores must share RAM, so you are limited by the number of cores that are on one node. 4) Execute the simulation as you would normally.

This release is experimental for the time being. If you have trouble or crashes or poor reproducability of previous work, revert to commit 2a5d68e https://github.com/EDmodel/ED2/commit/2a5d68ebb291581c932a442e2701e553b24b1170

ie:

git checkout 2a5d68e https://github.com/EDmodel/ED2/commit/2a5d68ebb291581c932a442e2701e553b24b1170

— Reply to this email directly or view it on GitHub https://github.com/EDmodel/ED2/issues/30.

Afshin Pourmokhtarian, Ph.D. Postdoctoral Research Associate Dietze Ecological Forecasting Lab Boston University Deptartment of Earth & Environment, Rm 130 685 Commonwealth Avenue Boston, MA 02215

rgknox commented 9 years ago

thanks Afshin, looking into this now

On Mon, Mar 9, 2015 at 3:19 PM, Afshin Pourmokhtarian < notifications@github.com> wrote:

Hi Ryan, I pulled your latest changes from main line. I am able to compile and run the model on my main branch but in my management branch that I am currently working, I can't compile the model since the pull. Here is the error I get, any thoughts?

Error: Unclassifiable statement at (1) canopy_struct_dynamics.f90:392.53:

csite%rough(ipa) = snow_rough * snowfac_can 1 Warning: Nonconforming tab character at (1) canopy_struct_dynamics.f90:591.53:

csite%rough(ipa) = snow_rough * snowfac_can 1 Warning: Nonconforming tab character at (1) canopy_struct_dynamics.f90:600.85:

urf_rough = soil_rough * (1.0 - snowfac_can) & 1 Warning: Nonconforming tab character at (1) canopy_struct_dynamics.f90:779.9:

lad(:) = 0.0 1 Error: Unclassifiable statement at (1) canopy_struct_dynamics.f90:821.27:

lad(k) = lad(k) + ladcohort 1 Error: Statement function at (1) is recursive canopy_struct_dynamics.f90:832.35:

lad(kapartial) = lad(kapartial) 1 Error: Statement function at (1) is recursive canopy_struct_dynamics.f90:836.35:

lad(kapartial) = lad(kapartial) 1 Error: Statement function at (1) is recursive canopy_struct_dynamics.f90:839.35:

lad(kzpartial) = lad(kzpartial) 1 Error: Statement function at (1) is recursive canopy_struct_dynamics.f90:893.27:

lad(k) = lad(k) + ladcohort 1 Error: Statement function at (1) is recursive canopy_struct_dynamics.f90:904.35:

lad(kapartial) = lad(kapartial) 1 Error: Statement function at (1) is recursive canopy_struct_dynamics.f90:908.35:

lad(kapartial) = lad(kapartial) 1 Error: Statement function at (1) is recursive canopy_struct_dynamics.f90:911.35:

lad(kzpartial) = lad(kzpartial) 1 Error: Statement function at (1) is recursive canopy_struct_dynamics.f90:933.12:

cdrag (:) = cdrag1 + 0.5 * cdrag2 1 Error: Unclassifiable statement at (1) canopy_struct_dynamics.f90:949.67:

cdrag (k) = cdrag1 + cdrag2 / (1.0 + exp(c3_lad)) 1 Error: Unexpected STATEMENT FUNCTION statement at (1) canopy_struct_dynamics.f90:950.32:

pshelter(k) = 1. 1 Error: Unexpected STATEMENT FUNCTION statement at (1) canopy_struct_dynamics.f90:952.47:

cumldrag(k) = ldga_bk + lyrhalf 1 Error: Unexpected STATEMENT FUNCTION statement at (1) canopy_struct_dynamics.f90:958.17:

cdrag (:) = cdrag0 1 Error: 'cdrag' at (1) is not a variable canopy_struct_dynamics.f90:971.53:

pshelter(k) = 1. + alpha_m97 * lad(k) 1 Error: Unexpected STATEMENT FUNCTION statement at (1) canopy_struct_dynamics.f90:973.47:

cumldrag(k) = ldga_bk + lyrhalf 1 Error: Unexpected STATEMENT FUNCTION statement at (1) canopy_struct_dynamics.f90:1049.61:

windlyr(k) = max(ugbmin, uh * exp(- nn * nddfun)) 1 Error: Unexpected STATEMENT FUNCTION statement at (1) canopy_struct_dynamics.f90:1197.91:

,csite%veg_displace(ipa),zzmid(k),csite%rough(ipa)) 1 Error: Unexpected STATEMENT FUNCTION statement at (1) canopy_struct_dynamics.f90:1339.7:

end associate 1 Error: Expecting END SUBROUTINE statement at (1) canopy_struct_dynamics.f90:1588.6:

associate( & 1 Error: Unclassifiable statement at (1) canopy_struct_dynamics.f90:2063.9:

lad8(:) = 0.d0 1 Error: Unclassifiable statement at (1) canopy_struct_dynamics.f90:2106.28:

lad8(k) = lad8(k) + ladcohort 1 Error: Statement function at (1) is recursive canopy_struct_dynamics.f90:2117.36:

lad8(kapartial) = lad8(kapartial) 1 Error: Statement function at (1) is recursive canopy_struct_dynamics.f90:2121.36:

lad8(kapartial) = lad8(kapartial) 1 Error: Statement function at (1) is recursive Fatal Error: Error count reached limit of 25. make[1]: * [canopy_struct_dynamics.o] Error 1 make[1]: Leaving directory `/usr2/postdoc/apourmok/ED2-1/ED/build/bin' make: * [all] Error 2

On Thu, Mar 5, 2015 at 8:10 PM, Ryan Knox notifications@github.com wrote:

Hi All,

I put the Shared Memory Parallelism commits on the master. This will allow for the splitting of radiation scattering, photosynthesis and thermodynamics of different patches to different CPU cores.

This has been tested using RK4 and Hybrid integration This has had limited testing on gridded runs This has had no testing on coupled runs (but I don't suspect any breakage).

If you don't want to use shared memory, just keep doing what you have done in the past and nothing should change.

If you do want to use it, follow these steps for a single polygon run:

1) compile code with shared memory directives, if you are using OpenMP, the flag is '-fopenmp' 2) (optional) increase your stack size. On linux: "ulimit -s unlimited" 3) set run-time environment variables. If you are using OpenMP, the key variable is OMP_NUM_THREADS. This defines how many shared memory cores will be used. On linux: "export OMP_NUM_THREADS=X" where X is the number of cores you wish to use. REMEMBER: These cores must share RAM, so you are limited by the number of cores that are on one node. 4) Execute the simulation as you would normally.

This release is experimental for the time being. If you have trouble or crashes or poor reproducability of previous work, revert to commit 2a5d68e < https://github.com/EDmodel/ED2/commit/2a5d68ebb291581c932a442e2701e553b24b1170

ie:

git checkout 2a5d68e < https://github.com/EDmodel/ED2/commit/2a5d68ebb291581c932a442e2701e553b24b1170

— Reply to this email directly or view it on GitHub https://github.com/EDmodel/ED2/issues/30.

Afshin Pourmokhtarian, Ph.D. Postdoctoral Research Associate Dietze Ecological Forecasting Lab Boston University Deptartment of Earth & Environment, Rm 130 685 Commonwealth Avenue Boston, MA 02215

— Reply to this email directly or view it on GitHub https://github.com/EDmodel/ED2/issues/30#issuecomment-77956542.

apourmok commented 9 years ago

Thanks Ryan. I did some research on "Error: Unclassifiable statement at (1)" and it seems it could be related to compilation options/flags. On separate note, my model crashes when run it with hybrid integrator, it crashes after 2 months in to the run. Should I post the error here or under hybrid integrator issue on Github?

On Mon, Mar 9, 2015 at 9:28 PM, Ryan Knox notifications@github.com wrote:

thanks Afshin, looking into this now

On Mon, Mar 9, 2015 at 3:19 PM, Afshin Pourmokhtarian < notifications@github.com> wrote:

Hi Ryan, I pulled your latest changes from main line. I am able to compile and run

the model on my main branch but in my management branch that I am currently working, I can't compile the model since the pull. Here is the error I get, any thoughts?

Error: Unclassifiable statement at (1) canopy_struct_dynamics.f90:392.53:

csite%rough(ipa) = snow_rough * snowfac_can 1 Warning: Nonconforming tab character at (1) canopy_struct_dynamics.f90:591.53:

csite%rough(ipa) = snow_rough * snowfac_can 1 Warning: Nonconforming tab character at (1) canopy_struct_dynamics.f90:600.85:

urf_rough = soil_rough * (1.0 - snowfac_can) & 1 Warning: Nonconforming tab character at (1) canopy_struct_dynamics.f90:779.9:

lad(:) = 0.0 1 Error: Unclassifiable statement at (1) canopy_struct_dynamics.f90:821.27:

lad(k) = lad(k) + ladcohort 1 Error: Statement function at (1) is recursive canopy_struct_dynamics.f90:832.35:

lad(kapartial) = lad(kapartial) 1 Error: Statement function at (1) is recursive canopy_struct_dynamics.f90:836.35:

lad(kapartial) = lad(kapartial) 1 Error: Statement function at (1) is recursive canopy_struct_dynamics.f90:839.35:

lad(kzpartial) = lad(kzpartial) 1 Error: Statement function at (1) is recursive canopy_struct_dynamics.f90:893.27:

lad(k) = lad(k) + ladcohort 1 Error: Statement function at (1) is recursive canopy_struct_dynamics.f90:904.35:

lad(kapartial) = lad(kapartial) 1 Error: Statement function at (1) is recursive canopy_struct_dynamics.f90:908.35:

lad(kapartial) = lad(kapartial) 1 Error: Statement function at (1) is recursive canopy_struct_dynamics.f90:911.35:

lad(kzpartial) = lad(kzpartial) 1 Error: Statement function at (1) is recursive canopy_struct_dynamics.f90:933.12:

cdrag (:) = cdrag1 + 0.5 * cdrag2 1 Error: Unclassifiable statement at (1) canopy_struct_dynamics.f90:949.67:

cdrag (k) = cdrag1 + cdrag2 / (1.0 + exp(c3_lad)) 1 Error: Unexpected STATEMENT FUNCTION statement at (1) canopy_struct_dynamics.f90:950.32:

pshelter(k) = 1. 1 Error: Unexpected STATEMENT FUNCTION statement at (1) canopy_struct_dynamics.f90:952.47:

cumldrag(k) = ldga_bk + lyrhalf 1 Error: Unexpected STATEMENT FUNCTION statement at (1) canopy_struct_dynamics.f90:958.17:

cdrag (:) = cdrag0 1 Error: 'cdrag' at (1) is not a variable canopy_struct_dynamics.f90:971.53:

pshelter(k) = 1. + alpha_m97 * lad(k) 1 Error: Unexpected STATEMENT FUNCTION statement at (1) canopy_struct_dynamics.f90:973.47:

cumldrag(k) = ldga_bk + lyrhalf 1 Error: Unexpected STATEMENT FUNCTION statement at (1) canopy_struct_dynamics.f90:1049.61:

windlyr(k) = max(ugbmin, uh * exp(- nn * nddfun)) 1 Error: Unexpected STATEMENT FUNCTION statement at (1) canopy_struct_dynamics.f90:1197.91:

,csite%veg_displace(ipa),zzmid(k),csite%rough(ipa)) 1 Error: Unexpected STATEMENT FUNCTION statement at (1) canopy_struct_dynamics.f90:1339.7:

end associate 1 Error: Expecting END SUBROUTINE statement at (1) canopy_struct_dynamics.f90:1588.6:

associate( & 1 Error: Unclassifiable statement at (1) canopy_struct_dynamics.f90:2063.9:

lad8(:) = 0.d0 1 Error: Unclassifiable statement at (1) canopy_struct_dynamics.f90:2106.28:

lad8(k) = lad8(k) + ladcohort 1 Error: Statement function at (1) is recursive canopy_struct_dynamics.f90:2117.36:

lad8(kapartial) = lad8(kapartial) 1 Error: Statement function at (1) is recursive canopy_struct_dynamics.f90:2121.36:

lad8(kapartial) = lad8(kapartial) 1 Error: Statement function at (1) is recursive Fatal Error: Error count reached limit of 25. make[1]: * [canopy_struct_dynamics.o] Error 1 make[1]: Leaving directory `/usr2/postdoc/apourmok/ED2-1/ED/build/bin' make: * [all] Error 2

On Thu, Mar 5, 2015 at 8:10 PM, Ryan Knox notifications@github.com wrote:

Hi All,

I put the Shared Memory Parallelism commits on the master. This will allow for the splitting of radiation scattering, photosynthesis and thermodynamics of different patches to different CPU cores.

This has been tested using RK4 and Hybrid integration This has had limited testing on gridded runs This has had no testing on coupled runs (but I don't suspect any breakage).

If you don't want to use shared memory, just keep doing what you have done in the past and nothing should change.

If you do want to use it, follow these steps for a single polygon run:

1) compile code with shared memory directives, if you are using OpenMP, the flag is '-fopenmp' 2) (optional) increase your stack size. On linux: "ulimit -s unlimited" 3) set run-time environment variables. If you are using OpenMP, the key variable is OMP_NUM_THREADS. This defines how many shared memory cores will be used. On linux: "export OMP_NUM_THREADS=X" where X is the number of cores you wish to use. REMEMBER: These cores must share RAM, so you are limited by the number of cores that are on one node. 4) Execute the simulation as you would normally.

This release is experimental for the time being. If you have trouble or crashes or poor reproducability of previous work, revert to commit 2a5d68e <

https://github.com/EDmodel/ED2/commit/2a5d68ebb291581c932a442e2701e553b24b1170

ie:

git checkout 2a5d68e <

https://github.com/EDmodel/ED2/commit/2a5d68ebb291581c932a442e2701e553b24b1170

— Reply to this email directly or view it on GitHub https://github.com/EDmodel/ED2/issues/30.

Afshin Pourmokhtarian, Ph.D. Postdoctoral Research Associate Dietze Ecological Forecasting Lab Boston University Deptartment of Earth & Environment, Rm 130 685 Commonwealth Avenue Boston, MA 02215

— Reply to this email directly or view it on GitHub https://github.com/EDmodel/ED2/issues/30#issuecomment-77956542.

— Reply to this email directly or view it on GitHub https://github.com/EDmodel/ED2/issues/30#issuecomment-77979539.

Afshin Pourmokhtarian, Ph.D. Postdoctoral Research Associate Dietze Ecological Forecasting Lab Boston University Deptartment of Earth & Environment, Rm 130 685 Commonwealth Avenue Boston, MA 02215

rgknox commented 9 years ago

Is it possible that your version of fortran does not like the associate statment? This is a new type of statement we have not had in the code as of yet.

I think this might part of a more recent fortran standard. I only put it in there because it helps with readability, but it might be problematic when it comes to portability.

apourmok commented 9 years ago

I thought that might be the case but surprisingly when I pull the SMP in to my mainline, I am able to compile it. Having said that, the only things I changed in my management branch are adding new western PFTs and few functions for logging and planting so I am confused how it throws me an error.

On Mon, Mar 9, 2015 at 10:26 PM, Ryan Knox notifications@github.com wrote:

Is it possible that your version of fortran does not like the associate statment? This is a new type of statement we have not had in the code as of yet.

I think this might part of a more recent fortran standard. I only put it in there because it helps with readability, but it might be problematic when it comes to portability.

— Reply to this email directly or view it on GitHub https://github.com/EDmodel/ED2/issues/30#issuecomment-77984916.

Afshin Pourmokhtarian, Ph.D. Postdoctoral Research Associate Dietze Ecological Forecasting Lab Boston University Deptartment of Earth & Environment, Rm 130 685 Commonwealth Avenue Boston, MA 02215

rgknox commented 9 years ago

Maybe different compile flags in your branch?

Regarding the crash with the hybrid, I think this would be a good new issue. Keep in mind that while the hybrid integrator is fast, there are a few underlying issues. 1) The forward stepping in hybrid is achieved via a simple Euler step which is not ideal, so it is susceptible to both error and instability. 2) The backward step on leaves uses temperature as the state variable and not enthalpy. Enthalpy is ideal because it allows for a smooth transition through phase changes, alternatively phase change is diagnosed from prognostic temperatures. The result is that you will get rapid step-like drops or gains in energy when crossing 0 degrees. It is possible to re-write the state variables to be enthalpy (or internal energy) instead of temperature, but this was an over-site during my thesis work, and since my thesis work was tropical.. I never had time to get back to it.

On Tue, Mar 10, 2015 at 8:07 AM, Afshin Pourmokhtarian < notifications@github.com> wrote:

I thought that might be the case but surprisingly when I pull the SMP in to my mainline, I am able to compile it. Having said that, the only things I changed in my management branch are adding new western PFTs and few functions for logging and planting so I am confused how it throws me an error.

On Mon, Mar 9, 2015 at 10:26 PM, Ryan Knox notifications@github.com wrote:

Is it possible that your version of fortran does not like the associate statment? This is a new type of statement we have not had in the code as of yet.

I think this might part of a more recent fortran standard. I only put it in there because it helps with readability, but it might be problematic when it comes to portability.

— Reply to this email directly or view it on GitHub https://github.com/EDmodel/ED2/issues/30#issuecomment-77984916.

Afshin Pourmokhtarian, Ph.D. Postdoctoral Research Associate Dietze Ecological Forecasting Lab Boston University Deptartment of Earth & Environment, Rm 130 685 Commonwealth Avenue Boston, MA 02215

— Reply to this email directly or view it on GitHub https://github.com/EDmodel/ED2/issues/30#issuecomment-78073334.

apourmok commented 9 years ago

Hi Ryan, It seems that the problem with SMP on BU geo cluster is old version of Fortran as you said. Although when I load gcc/4.8.1 module and try to compile it again, I get a new error (see below). I found on Stackoverflow that I can get around this problem with adding "-fno-whole-file" to my compilation flag ( http://stackoverflow.com/questions/21307765/gfortran-attribute-that-requires-an-explicit-interface-for-this-procedure )

but now I get a new error (see the bottom after -----). Any idea?

Error: _Dummy argument 'cgrid' of procedure 'soil_defaultfill' at (1) has an attribute that requires an explicit interface for this procedure ed_init.f90:465.29:

     call print_soil_info(edgrid_g(igr),igr)
                         1

Error: _Dummy argument 'cgrid' of procedure 'print_soilinfo' at (1) has an attribute that requires an explicit interface for this procedure make[1]: * [ed_init.o] Error 1 make[1]: Leaving directory `/usr2/postdoc/apourmok/ED2-1/ED/build/bin' make: * [all] Error 2


Fatal Error: Cannot read module file 'hdf5.mod' opened at (1), because it was created by a different version of GNU Fortran make[1]: * [hdf5_coms.o] Error 1 make[1]: Leaving directory `/usr2/postdoc/apourmok/ED2-1/ED/build/bin' make: * [all] Error 2

Thanks, Afshin

On Tue, Mar 10, 2015 at 2:29 PM, Ryan Knox notifications@github.com wrote:

Maybe different compile flags in your branch?

Regarding the crash with the hybrid, I think this would be a good new issue. Keep in mind that while the hybrid integrator is fast, there are a few underlying issues. 1) The forward stepping in hybrid is achieved via a simple Euler step which is not ideal, so it is susceptible to both error and instability. 2) The backward step on leaves uses temperature as the state variable and not enthalpy. Enthalpy is ideal because it allows for a smooth transition through phase changes, alternatively phase change is diagnosed from prognostic temperatures. The result is that you will get rapid step-like drops or gains in energy when crossing 0 degrees. It is possible to re-write the state variables to be enthalpy (or internal energy) instead of temperature, but this was an over-site during my thesis work, and since my thesis work was tropical.. I never had time to get back to it.

On Tue, Mar 10, 2015 at 8:07 AM, Afshin Pourmokhtarian < notifications@github.com> wrote:

I thought that might be the case but surprisingly when I pull the SMP in to my mainline, I am able to compile it. Having said that, the only things I changed in my management branch are adding new western PFTs and few functions for logging and planting so I am confused how it throws me an error.

On Mon, Mar 9, 2015 at 10:26 PM, Ryan Knox notifications@github.com wrote:

Is it possible that your version of fortran does not like the associate statment? This is a new type of statement we have not had in the code as of yet.

I think this might part of a more recent fortran standard. I only put it in there because it helps with readability, but it might be problematic when it comes to portability.

— Reply to this email directly or view it on GitHub https://github.com/EDmodel/ED2/issues/30#issuecomment-77984916.

Afshin Pourmokhtarian, Ph.D. Postdoctoral Research Associate Dietze Ecological Forecasting Lab Boston University Deptartment of Earth & Environment, Rm 130 685 Commonwealth Avenue Boston, MA 02215

— Reply to this email directly or view it on GitHub https://github.com/EDmodel/ED2/issues/30#issuecomment-78073334.

— Reply to this email directly or view it on GitHub https://github.com/EDmodel/ED2/issues/30#issuecomment-78116880.

Afshin Pourmokhtarian, Ph.D. Postdoctoral Research Associate Dietze Ecological Forecasting Lab Boston University Deptartment of Earth & Environment, Rm 130 685 Commonwealth Avenue Boston, MA 02215

rgknox commented 9 years ago

I went ahead and removed the associate statements, and thereby replaced the aliases with their original variables. This change should make the code complient with your original compilers.

Perhaps we can discuss as a community during our next get-together whether we want to embrace the more recent fortran standards for future releases.

apourmok commented 9 years ago

Thanks Ryan. It is working now. We definitely need to talk about this issue in the next ED2 call/meeting.

On Thu, Mar 12, 2015 at 7:25 PM, Ryan Knox notifications@github.com wrote:

I went ahead and removed the associate statements, and thereby replaced the aliases with their original variables. This change should make the code complient with your original compilers.

Perhaps we can discuss as a community during our next get-together whether we want to embrace the more recent fortran standards for future releases.

— Reply to this email directly or view it on GitHub https://github.com/EDmodel/ED2/issues/30#issuecomment-78688966.

Afshin Pourmokhtarian, Ph.D. Postdoctoral Research Associate Dietze Ecological Forecasting Lab Boston University Deptartment of Earth & Environment, Rm 130 685 Commonwealth Avenue Boston, MA 02215

crollinson commented 9 years ago

@rgknox

I just tried running the SMP version with the PalEON stuff and 5 out of 6 runs have crashed between 15 and 30 years into the simulations. The error I'm getting is pasted below. Sometimes the top function is [...]mmean_vars instead of dmean, but the rest is the same. I haven't tried digging into it yet and figured I'd ask you to see if you know what's going on first. I'm running things with the hybrid integrator and the new CBR_SCHEME = 0


Program received signal 8 (SIGFPE): Floating-point exception.

Backtrace for this error:

crollinson commented 9 years ago

Quick update @rgknox:

I don't know if this helps at all, but all 6 models have now crashed with the SIGFPE error. 4 out of the six reference a " * frqsum_o_daysec" line in average_utils. The other 2 are " * ndaysi"

Thoughts?

rgknox commented 9 years ago

I can confirm similar problems, although I can get stable results using my local branch which has a small set of differences with master, I hope to identify the culprit soon. It's possible it has something to do with the removal of the associate statement in the last commit, or perhaps related to how the snowfac changes were applied/merged. Sorry all, we will get this patched soon On Mar 18, 2015 3:41 PM, "Christy Rollinson" notifications@github.com wrote:

Quick update @rgknox https://github.com/rgknox:

I don't know if this helps at all, but all 6 models have now crashed with the SIGFPE error. 4 out of the six reference a " * frqsum_o_daysec" line in average_utils. The other 2 are " * ndaysi"

Thoughts?

— Reply to this email directly or view it on GitHub https://github.com/EDmodel/ED2/issues/30#issuecomment-83214588.

crollinson commented 9 years ago

Interesting that you mention snowfac... I just had my non-SMP ED (with CBR changes) crash (SIGFPE error 8) with it tracing back to snowfac in the radiate driver (line 757). That's the first time in about 2,000 years of runs with the normal version, but maybe I'm the one to blame... (sorry!)

crollinson commented 9 years ago

Sorry to flood the comments, but it looks like the error being tied to snowfac is likely. All of the SMP errors were tied to par_level variables.

At least this time, it doesn't seem to be a snow issue as most of my errors are being thrown in non-winter months.

apourmok commented 9 years ago

@crollinson did you change some part the code in radiate_driver as part of you snow fix?

crollinson commented 9 years ago

@apourmok Nope. I steered clear of that one.

mpaiao commented 9 years ago

@crollinson I remember seeing problems with frqsum_o_daysec and ndaysi, and I think it was related to -Q- files (or -Q- files turned off) that would cause division by 0. I thought we had fixed it, but maybe we didn't fix everything... Could you share the ED2IN that caused the problem so I check the configurations that created the problem? Did the problem occur right at the beginning, or at the beginning of a new month?

crollinson commented 9 years ago

@mpaiao It didn't crash right at the beginning. On the BU server, there's a lag between what gets written to the out file, so I'm not exactly sure if it crashed at the beginning of a new month. Q files are turned off. My ED2IN files and as well as the crash logs can be found in one of my github repositories: https://github.com/crollinson/ED_Processing/tree/master/spin_finish_smp Keep in mind that the last date in the log is not necessarily the date of the crash.

Restarting from a history file gets me past the crash point, so maybe it's at least partially a problem with an uninitialized variable?

mpaiao commented 9 years ago

@crollinson It seems the crash is always happening when it's integrating these par_level_diffu/par_level_diffd variables. The radiation code has some substantial differences from the version I updated last time (there used to be par_level_diff only), but I checked the usual places where variables should be initialised and nothing stood out. Maybe the value is becoming too large and eventually overflows when average_utils accumulates it over the month? It may be worth checking the values of these variables in the -E- files the code generated before it crashed, I think they should be always between 0 and 1, unless their definition has changed.

rgknox commented 9 years ago

I was able to remove the crash by reverting to the previous %snowfac formulation, but:

the trouble may specifically involve line 489 in rk4_derivs.f90:

  rk4aux(ibuff)%h_flux_g   (mzg+1) = -

avg_th_cond & * (initp%sfcwater_tempk(1) - initp%soil_tempk(mzg)) & / (5.d-1 * initp%sfcwater_depth(1) - slzt8(mzg) )

It was when I changed this line back to the original that things started working again.

On Wed, Mar 18, 2015 at 6:49 PM, Marcos Longo notifications@github.com wrote:

@crollinson https://github.com/crollinson It seems the crash is always happening when it's integrating these par_level_diffu/par_level_diffd variables. The radiation code has some substantial differences from the version I updated last time (there used to be par_level_diff only), but I checked the usual places where variables should be initialised and nothing stood out. Maybe the value is becoming too large and eventually overflows when average_utils accumulates it over the month? It may be worth checking the values of these variables in the -E- files the code generated before it crashed, I think they should be always between 0 and 1, unless their definition has changed.

— Reply to this email directly or view it on GitHub https://github.com/EDmodel/ED2/issues/30#issuecomment-83259209.

crollinson commented 9 years ago

Thanks @rgknox. That's unfortunate, but not surprising that's where the problem is coming from. I'll take a closer look this afternoon, but the way the way the soil-snow-air were interacting were causing major problems in the northeast. I think I'd tried reverting this spot back to the original and it was one of the key spots that made snow break. I'll admit though, I got turned around as to where the problem was coming from.

I can spend some time on it probably tomorrow morning (maybe this afternoon), but maybe @mpaiao could take a look and double check places I've changed and argue the case for reverting them?

crollinson commented 9 years ago

@rgknox I just tried setting this line back to the old version where h_flux_g(mzg+1) is scaled by snowfac & it made things worse, not better in my branch.

@mpaiao this is a spot where I could follow your logic & it makes sense, but it causes really weird fluxes in my snow layers and getting rid of snowfac in that statement made the hflux in the first snow layer sensible.

mpaiao commented 9 years ago

@crollinson this is rarely used in the tropics, just as short-lived puddles, so if removing it improves results in snowy areas, then I'm totally fine with getting rid of snowfac. I don't think it violates any energy conservation either, which would be my only concern.

crollinson commented 9 years ago

I've done a couple more tests with things that have made snow more stable in the past and I really don't think the problem is rooted with snowfac. In the snowfac tweaking, I've gotten a ton of other SIGFPE errors (with no change in frequency), and a couple were not in average_utils. Every time it ties back to a line group with par_level_diffu, par_level_dffd, or par_level_beam. I'm not sure when/why these came into the mainline, but they weren't in the version I was using for the CBR fixes and so I'm having a hard time tracking down what's going on with them. Any thoughts?

crollinson commented 9 years ago

And yet another update: I was going through the all of the stuff that is printed during compiling and came up with 5 flags for potentially uninitialized variables. I haven't tracked each of them down thoroughly and would appreciate it if anybody that knows about these sections could chime in. The flags are are (in order of what I think are potential breaking points):

1_) rk4_misc.f90: In function 'adjust_sfcw_properties': rk4_misc.f90:1565: warning: 'depthavailable' may be used uninitialized in this function *This might be tied to to the snow problems I've been dealing with. This bug exists in much older versions of ED (c. 2013 at least), but may have been less of an issue until the more recent change in snowfac, depending on the order in which certain things are done. (That's just a hunch at this point, but could fit in with the par stuff as well)

2) ed_state_vars.f90: In function 'copy_sitetype_mask': ed_state_vars.f90:8795: warning: 'i' may be used uninitialized in this function

3) ed_read_ed21_history.F90: In function 'read_ed21_history_file': ed_read_ed21_history.F90:447: warning: 'si_index' may be used uninitialized in this function

4) heun_driver.f90: In function 'heun_stepper': heun_driver.f90:807: warning: 'combh' may be used uninitialized in this function (note: I've been running with the hybrid driver, so this is almost certainly not the issue)

5) events.f90: In function 'event_irrigate': events.f90:649: warning: 'soil_temp' may be used uninitialized in this function (note: this shouldn't occur in my runs at all, so this is almost certainly not the issue)

crollinson commented 9 years ago

I've tracked down the source of the depth_available uninitialization (#1 above). It actually dates back to @mpaiao in Jan 2012 (Jan 5). I've tried to adapt things to how things work now based on my best guess of what was going on in the version before that commit. What I have now is: !---------------------------------------------------------------------------------! ! There is not enough water vapour. Dry down to the minimum, and hope for the ! ! best. ! !---------------------------------------------------------------------------------! energy_available = wmass_available * (alvi8 - fracliq_needed * alli8)
depth_available = wmass_available * ( fracliq_needed * wdnsi8 &

I'm currently going to let energy_available be overwritten by what is currently in the code (energy_available = wmass_available * energy_needed / wmass_needed).

@mpaiao, since you're the one that made this change, could you double check the new depth_avail initialization and see if it makes sense? The old version was: depth_available = wmass_available * ( initp%soil_fracliq(nzg) * wdnsi8 &

crollinson commented 9 years ago

While the uninitialized variables still need to be sorted out, fixing the snow_depth issue alone has not fixed the SMP crashes. Everything continues to point back to whatever was done to introduce par_level_diffu/par_level_diffd

rgknox commented 9 years ago

I will look into that, those are my diagnostics On Mar 20, 2015 10:19 AM, "Christy Rollinson" notifications@github.com wrote:

While the uninitialized variables still need to be sorted out, fixing the snow_depth issue alone has not fixed the SMP crashes. Everything continues to point back to whatever was done to introduce par_level_diffu/par_level_diffd

— Reply to this email directly or view it on GitHub https://github.com/EDmodel/ED2/issues/30#issuecomment-84077014.

rgknox commented 9 years ago

Christy, what radiation scheme are you using? it will help me track this ICANRAD = ?

crollinson commented 9 years ago

I'm running icanrad=0.

All of my ED2INs with my settings can be found in one of my github repos: https://github.com/crollinson/ED_Processing/tree/master/spin_finish_smp

On Mar 20, 2015, at 2:14 PM, Ryan Knox notifications@github.com wrote:

Christy, what radiation scheme are you using? it will help me track this ICANRAD = ?

— Reply to this email directly or view it on GitHub.

rgknox commented 9 years ago

I am having trouble reproducing errors regarding par_level variables, is there any crash report info you could provide, like tracebacks etc?

rgknox commented 9 years ago

I am thinking we should really deprecate ICANRAD=0 anyway, Marcos and I have both gone through the code and theory for two-stream with a fine-tooth comb, and get more sensible and consistent answers with the updated two-stream (ICANRAD=2). Christy, could you test with ICANRAD=2 and ICANRAD=1 if possible?

crollinson commented 9 years ago

There are examples of the first couple crashes in that folder I linked to above. Ryan, I'll start an ICANRAD=2 run now.

If you think it might be a problem with my starting conditions, I just uploaded my .pss & .css files that you could try running. https://github.com/crollinson/ED_Processing/tree/master/phase1a_spinup.v2

These were created from an SAS solution after 150 years of non-SMP run with disturb off (those ED2INs are also on my github), which shouldn't affect how things are running from an initial, but I suppose it's possible.

rgknox commented 9 years ago

I found and fixed a potential bug that my be your problem with par_level variables. The history reads were not including those variables, I have updated this. I will branch off your master, commit, and give you a push request.

On Fri, Mar 20, 2015 at 12:17 PM, Christy Rollinson < notifications@github.com> wrote:

There are examples of the first couple crashes in that folder I linked to above. Ryan, I'll start an ICANRAD=2 run now.

If you think it might be a problem with my starting conditions, I just uploaded my .pss & .css files that you could try running. https://github.com/crollinson/ED_Processing/tree/master/phase1a_spinup.v2

These were created from an SAS solution after 150 years of non-SMP run with disturb off (those ED2INs are also on my github), which shouldn't affect how things are running from an initial, but I suppose it's possible.

— Reply to this email directly or view it on GitHub https://github.com/EDmodel/ED2/issues/30#issuecomment-84104858.

crollinson commented 9 years ago

Fantastic! Thanks Ryan!

rgknox commented 9 years ago

For some reason I could not branch off your master. I put the changes into the mainline. Could you try merging the changes into your local, Christy?

rgknox commented 9 years ago

I went back to archives of 2012 and then 2011 when the Heun integrator implementation was in its infancy, never did I find any instance where the local variable "combh" was initialized before it was used.

I personally have no experience using the Heun integrator, and unless someone is invested in this option, I propose we just disable it until that person steps up.

crollinson commented 9 years ago

Thanks Ryan! I was able to pull the mainline into my branch and have things running now. I'm about 15 years in at 3 sites and so far so good. I'll let you know how things turn out.

rgknox commented 9 years ago

OK, I will pull a clone and test as well On Mar 20, 2015 2:52 PM, "Christy Rollinson" notifications@github.com wrote:

Thanks Ryan! I was able to pull the mainline into my branch and have things running now. I'm about 15 years in at 3 sites and so far so good. I'll let you know how things turn out.

— Reply to this email directly or view it on GitHub https://github.com/EDmodel/ED2/issues/30#issuecomment-84163816.

crollinson commented 9 years ago

@rgknox SMP is still a no-go for me. Currently ICANRAD = 1, one only got 4 years with backtrace as follows:

Program received signal 8 (SIGFPE): Floating-point exception.

Backtrace for this error:

another made it 30 years, but still got SIGFPE fails with par_level vars

Program received signal 8 (SIGFPE): Floating-point exception.

Backtrace for this error:

Is there maybe some sort of min/max bound to keep the number from getting too small or something of the sort?

mpaiao commented 9 years ago

The other items:

2) ed_state_vars.f90: I checked the code and it looks fine, maybe it is complaining of this line: allind = (/ (i,i=1,isize) /) Kind of ugly, but I don't think it is wrong

3) ed_read_ed21_history.F90: this is likely to be a bug. I went back to a version from 2012, and I think the block near line 447 should be:

           csite => cpoly%site(isi)
           !------ Calculate the index of this site's data in the HDF. ----------------!
           si_index = pysi_id(py_index) + isi - 1
           if (sipa_n(si_index) > 0) then
rgknox commented 9 years ago

Cohort fusion is not including the qmean, mmean and dmean averages of the par_level diagnostics, that is a problem that I am writing a fix for... but I would not expect that to be the cause for a crash. I will submit another mini commit to the master and keep looking.

On Fri, Mar 20, 2015 at 3:33 PM, Marcos Longo notifications@github.com wrote:

The other items:

2) ed_state_vars.f90: I checked the code and it looks fine, maybe it is complaining of this line: allind = (/ (i,i=1,isize) /) Kind of ugly, but I don't think it is wrong

3) ed_read_ed21_history.F90: this is likely to be a bug. I went back to a version from 2012, and I think the block near line 447 should be:

       csite => cpoly%site(isi)
       !------ Calculate the index of this site's data in the HDF. ----------------!
       si_index = pysi_id(py_index) + isi - 1
       if (sipa_n(si_index) > 0) then

— Reply to this email directly or view it on GitHub https://github.com/EDmodel/ED2/issues/30#issuecomment-84170422.

crollinson commented 9 years ago

@rgknox I have an idea on why it's bonking and why it's a stochastic thing.

I keep coming back to lines like this (889-899 in multiple_scatter): !------ Integrate the visible light levels. --------------------------------------! ! NEEDS TO BE CHECKED (PARTICULARLY THE UPWARD) ! THIS SHOULD BE THE LEVEL (COHORT) CENTERED FLUX OF PAR do i=1,ncoh ip1 = i + 1 im1 = i - 1 par_level_diffd(i) = 5.d-1 * (swd(i) + swd(ip1)) / (par_diff_norm + par_beam_norm) par_level_diffu(i) = 5.d-1 * (swu(i) + swu(im1)) / (par_diff_norm + par_beam_norm) par_level_beam (i) = 5.d-1 * (beam_down(i) + beam_down(ip1)) / (par_diff_norm+par_beam_norm) end do !---------------------------------------------------------------------------------!

Could it have to do with something being off with i+1 or i-1? It looks like swd & swu are okay, but changing the numbers of those & not initializing the values right would explain the random nature of the crashes I'm seeing.

crollinson commented 9 years ago

FYI, I'm now running a test with the ED2 mainline version (no changes to CBR) to make sure it's something there and not a weird artifact in my branch that's causing all of these issues

rgknox commented 9 years ago

The ip1 and im1 thing looks like it should be ok. I did a double check and it the code seems logical there. Note that variable swu is allocated to allow a zero index, so when i=1 and im1=0, this should be fine. I do wonder if some compilers have issue with this though.

Do you get the same issues with ICANRAD=2?

We are close! I'm sorry the par_level variables are being such a pain. They are pretty useful though, because this are direclty comparable to par flux sensors at differnet heights in a canopy.

On Fri, Mar 20, 2015 at 3:52 PM, Christy Rollinson <notifications@github.com

wrote:

@rgknox https://github.com/rgknox I have an idea on why it's bonking and why it's a stochastic thing.

I keep coming back to lines like this (889-899 in multiple_scatter): !------ Integrate the visible light levels. --------------------------------------! ! NEEDS TO BE CHECKED (PARTICULARLY THE UPWARD) ! THIS SHOULD BE THE LEVEL (COHORT) CENTERED FLUX OF PAR do i=1,ncoh ip1 = i + 1 im1 = i - 1 par_level_diffd(i) = 5.d-1 * (swd(i) + swd(ip1)) / (par_diff_norm + par_beam_norm) par_level_diffu(i) = 5.d-1 * (swu(i) + swu(im1)) / (par_diff_norm + par_beam_norm) par_level_beam (i) = 5.d-1 * (beam_down(i) + beam_down(ip1)) / (par_diff_norm+par_beam_norm) end do

!---------------------------------------------------------------------------------!

Could it have to do with something being off with i+1 or i-1? It looks like swd & swu are okay, but changing the numbers of those & not initializing the values right would explain the random nature of the crashes I'm seeing.

— Reply to this email directly or view it on GitHub https://github.com/EDmodel/ED2/issues/30#issuecomment-84175272.

crollinson commented 9 years ago

I guess one other thing to note is that my PFT settings are quite dramatically different from the ED defaults, and that could definitely be impacting PAR things. It doesn't explain the randomness of the the errors though. If you want to check out those settings anyway, they're also on github: https://github.com/crollinson/ED_Processing/blob/master/PalEON_Phase1a.v2.xml

Thanks for being so responsive and helping me figure out what's going wrong! Once we get SMP fully working, you'll be the hero of the PalEON team for speeding up our millennial runs so much.

rgknox commented 9 years ago

I took a quick look at one of your ED2IN's Christy.

https://raw.githubusercontent.com/crollinson/ED_Processing/master/spin_finish_ED2IN/ED2IN.PBL

One thing I noticed is that you have a relatively large timestep set for DTLSM (900) when using the hybrid integration (INTEGRATION_SCHEME=3). This may be a cause of stability problems, although I don't think it would generate a seg fault or the things we are currently trouble-shooting, I would be prepared to reduce this time-step if the model generates any complaints regarding it's various self checks. For my most recent research runs for istance, I used a DTLSM of 180 in the tropics. Also try the RK4 integration if you have stability problems, that method (while potentially slower) forces the integration to meet an error criterion, the hybrid does not (it can't).

On Fri, Mar 20, 2015 at 4:16 PM, Christy Rollinson <notifications@github.com

wrote:

I guess one other thing to note is that my PFT settings are quite dramatically different from the ED defaults, and that could definitely be impacting PAR things. It doesn't explain the randomness of the the errors though. If you want to check out those settings anyway, they're also on github: https://github.com/crollinson/ED_Processing/blob/master/PalEON_Phase1a.v2.xml

Thanks for being so responsive and helping me figure out what's going wrong! Once we get SMP fully working, you'll be the hero of the PalEON team for speeding up our millennial runs so much.

— Reply to this email directly or view it on GitHub https://github.com/EDmodel/ED2/issues/30#issuecomment-84187766.

crollinson commented 9 years ago

I just confirmed that I get the same errors with the github mainline branch ICANRAD = 2 and ICANRAD = 1 as well.

I haven't been having stability issues and my gh24 pre-SMP branch is working fine, but I'll keep in mind bumping the timestep down if I start encountering issues.

crollinson commented 9 years ago

Pretty sure I just found the problem!!! The par_level variables were missing from ed_type_init.f90

Made the changes and pulling into my line for testing, but this would explain everything.

DanielNScott commented 9 years ago

Hey Christy, did it turn out that was the issue? Are your SMP runs stable now with the hybrid integrator? If so, can you run w/ DTLSM > 180?