EDmodel / ED2

Ecosystem Demography Model
78 stars 112 forks source link

Output files #179

Closed gklarenberg closed 8 years ago

gklarenberg commented 8 years ago

Hi - I am just getting started with ED2: I'd like to look at the model's sensitivity to disturbance and land use changes eventually, but for now I am just trying to get some test runs done. My study area is the Amazon, so I started off using (adjusting) files in src/test_cases/amazon_soi I have the feeling I'm running into a very basic issue but I don't know how to solve it:

Am I referencing the output folder wrong (the NL%AFILOUT line is line 57)? I've tried a couple of things (full pathname, start with ./, no prefix, setting IFOUTPUT to 1), but I keep getting the same error. I'm not sure what the output is supposed to be specified as?

mdietze commented 8 years ago

Out put file paths are specified in this section

   !---------------------------------------------------------------------------------------!
   ! FFILOUT -- Path and prefix for analysis files (all but history/restart).              !
   ! SFILOUT -- Path and prefix for history files.                                         !
   !---------------------------------------------------------------------------------------!

   NL%FFILOUT = '/mypath/generic-prefix'
   NL%SFILOUT = '/mypath/generic-prefix'

I'm not familiar with AFILOUT -- that's probably a typo, and may be the route of your error. File paths should be absolute.

Second, output file types are controlled here

   !---------------------------------------------------------------------------------------!
   ! ED2 File output.  For all the variables 0 means no output and 3 means HDF5 output.    !
   !                                                                                       !
   ! IFOUTPUT -- Fast analysis.  These are mostly polygon-level averages, and the time     !
   !             interval between files is determined by FRQANL                            !
   ! IDOUTPUT -- Daily means (one file per day)                                            !
   ! IMOUTPUT -- Monthly means (one file per month)                                        !
   ! IQOUTPUT -- Monthly means of the diurnal cycle (one file per month).  The number      !
   !             of points for the diurnal cycle is 86400 / FRQANL                         !
   ! IYOUTPUT -- Annual output.                                                            !
   ! ITOUTPUT -- Instantaneous fluxes, mostly polygon-level variables, one file per year.  !
   ! ISOUTPUT -- restart file, for HISTORY runs.  The time interval between files is       !
   !             determined by FRQHIS                                                      !
   !---------------------------------------------------------------------------------------!
   NL%IFOUTPUT  =  0
   NL%IDOUTPUT  =  0
   NL%IMOUTPUT  =  3
   NL%IQOUTPUT  =  0
   NL%IYOUTPUT  =  0
   NL%ITOUTPUT  =  3
   NL%ISOUTPUT  =  3

At least one of these things should be set to 3 -- if they're all 0 ED won't write any outputs. None should be set to 1 or 2, those are older deprecated file formats.

gklarenberg commented 8 years ago

@mdietze Thanks! That worked. And I also thought it might be a typo, but I came across https://github.com/EDmodel/ED2/wiki/Misc-parameters in which AFILOUT is referenced, so I wasn't sure. I also realized that the ED2IN files in src/testcases have a bunch of deprecated namelist inputs. I am updating everything now using the ED2IN file in /run

gklarenberg commented 8 years ago

If it is okay, I am going to continue on this thread concerning test run issues? I get the errors

>>>> opspec_grid error! in your namelist! ---> Reason: Too few soil layers. Set it to at least 2. Your nzg is currently set to -999...

>>>> opspec_grid error! in your namelist! ---> Reason: Too few maximum # of snow layers. Set it to at least 1. Your nzs is currently set to -999.

However, in ED2IN, these are definitely specified: NL%NZG = 9 NL%NZS = 1 Are there any other settings that would affect the way these are read?

crollinson commented 8 years ago

SLZ, SLMSTR, and STGOFF must also be of length NZG

I think you can also get this error if you're initializing the model from the wrong state (i.e. trying to restart as a HISTORY run and the histo file doesn't have all the proper fields).

gklarenberg commented 8 years ago

@crollinson That were my thoughts too. But SLZ, SLMSTR and STGOFF are NL%SLZ = -2.307, -1.789, -1.340, -0.961, -0.648, -0.400, -0.215, -0.089, -0.020 NL%SLMSTR = 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00 NL%STGOFF = 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00 And since I don't have history files (I just want to do a very basic first run to make sure everything is being read properly), I have set NL%RUNTYPE = 'INITIAL' and NL%IED_INIT_MODE = 0 I haven't commented out NL%SFILIN, NL%ITIMEH, NL%IDATEH, NL%IMONTHH, NL%IYEARH, thinking these won't be used anyway considering NL%IED_INIT_MODE = 0? I have set everything else to the easiest settings, like NL%ISOILFLG = 2. NL%NSLCON has only one value (11), but I didn't think that was associated with the soil layers?

Otherwise, are there any history files available somewhere (for the Amazon region) that I could use to start a simulation?

crollinson commented 8 years ago

hmmmmm. I don't work in the Amazon, so I can't help you there.

Something else to check: Are ISOILSTATEINIT and ISOILDEPTHFLG = 0 ? If not, you're trying to read from the soil database which could be buggy. This database should be declared by: SOIL_DATABASE, SOIL_STATE_DB, and SOILDEPTH_DB.

gklarenberg commented 8 years ago

@crollinson Yes, although I specified all those databases, I still set ISOILSTATEINIT and ISOILDEPTHFLG both to 0

crollinson commented 8 years ago

do you have your full ED2IN uploaded somewhere? Might be easier if I (or someone else) could glance through the full thing for something we may be overlooking. I could then also try it with my older version of ED to help figure out if this is a bug introduced by recent changes. I've had quite a few issues with bare ground spinup with the mainline version.

gklarenberg commented 8 years ago

https://drive.google.com/file/d/0B480U5TEKRJmUEU2a3lNeDdMejA/view?usp=sharing

crollinson commented 8 years ago

Okay, I think the problem may be coming from you trying to define two regions of interest: NL%ED_REG_LATMIN = -15.0, 10.0 ! list of minimum latitudes of the ED regions NL%ED_REG_LATMAX = 0.0, 20.0 NL%ED_REG_LONMIN = -85.0, 50.0 NL%ED_REG_LONMAX = -50.0, 60.0

You have N_ED_REGION = 0 and N_POI=1, so I think all of the above should only have a single value. Try removing the second number after each of those and see what happens.

gklarenberg commented 8 years ago

@crollinson Also just tried that... No luck... Could it have something to do with using MPI? I don't have much experience with that... I have compiled ED2 on our HPC (intel/2016.0.109 openmpi/1.10.2 hdf5/1.8.17) and I've noticed that trying to a serial run with ./ed_2.1-opt gives me an error:

--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  PMI2_Job_GetId failed failed
  --> Returned value (null) (14) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_init failed
  --> Returned value (null) (14) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: ompi_rte_init failed
  --> Returned "(null)" (14) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[i21a-s2.ufhpc:31133] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed!

I checked with our HPC support staff but didn't get much of a response. They said "For a parallel run, you do not need to specify the nodes. As long as you request the appropriate amount of resources in your job script everything should run as intended." So I am running it as mpiexec ed_2.1-opt in a dev session right now but since edmain.F90 says Read the namelist and initialize the variables in the nodes if needed. I realized maybe I should specify nodes? Though mpirun -np 1 ./ed_2.1-opt gives me the same error...

crollinson commented 8 years ago

It could be a problem with MPI -- if the processes aren't being linked and spawned properly, you can have this sort of initialization error. Unfortunately how the parallelization is setup and executed can be very system-specific and you'll probably have to talk with someone with more knowledge of your system.

I'd try re-compiling ED as a serial process and see if that solves your problem. If it does, then you'll have to work with your IT or someone else local that understands ED/the system to figure out the appropriate compiling and running flags.

I've also been using SMP which i think is different from the original MPI setup in terms of how sites & memory is shared. If you only have 1 point and want to run in parallel, you need to use the SMP (shared memory processing?) because it does need to share the settings and latest time step across nodes.

gklarenberg commented 8 years ago

@crollinson Thanks for your help - I've reached out to our HPC support staff to see if they have any ideas. I didn't even realize ED can be compiled differently for serial processes!

fabeit commented 8 years ago

Does this help with the soil data check problems? https://github.com/EDmodel/ED2/issues/170

gklarenberg commented 8 years ago

@fabeit Thanks, I gave that a try: set it up to read the FAO soil database etc, but got the same error (I tried it both with NZG and NZS commented out, and not commented out). I can't work out why it is not reading the actual values, I worry maybe there is something wrong with the settings of the ED2IN file? It's been a long time since I worked with Fortran...

fabeit commented 8 years ago

Have a look at my ed2in, you will have to change the coordinates of the simulation but you can check the other settings. I am doing a bare ground run and read soil info from FAO db. ed2in_ew1_i.txt

gklarenberg commented 8 years ago

@fabeit Thanks, I tried your ED2IN file, and strangely I get errors again, not for NZG and NZS but virtually everything else, starting from ISOILSTATEINIT. I still think there might be something wrong with reading the data: does anyone know if these are the first error messages that show up if the text file is not read in properly, or should I get an error message about earlier variables (such as RUNTYPE and regional/POI runs, which is what I deduct from going through ed_1st.F90 and ed_opspec.F90)? As an FYI, I have a Mac and edit text files in TextEdit or TextMate (I opened @fabeit 's file on my computer too). I upload files to our HPC, which is a Linux system with Intel Fortran. I thought issues mostly arise with Unix/Windows systems, not Unix/Linux systems but it's the only thing I can think of. (Also, I cleaned and deleted all ED2 files, downloaded them anew from this Github, recompiled, but the issue remains)

gklarenberg commented 8 years ago

I apologize for creating such a long thread for what turns out to be a simple solution... I suspected a / character in the file paths in ED2IN were creating the problem, and upon closer inspection, it turns out some of them were in 'curly' apostrophes ('smart quotes')... Which I guess made Fortran stop reading the input file (I had copy-pasted my paths into @fabeit 's input file, of course). Evidently TextEdit on Mavericks has smart quotes turned on by default!

fabeit commented 8 years ago

Let me suggest sublime text ;-)