Code coverage and potential redundancies

manfredo89 commented 6 years ago

Since starting to work with ED I always felt like a priority to try simplifying it (especially to make it more accessible to colleagues that are not willing to dedicate their entire PhD to modelling). Some time ago @DanielNScott wrote a wiki page about a possible avenue to modernize the code https://github.com/EDmodel/ED2/wiki/Updating-to-F03-F08,-Iterators-for-Data-Structures. My feeling is that if we want to port the code to a more modular (perhaps even object oriented) design, the first step is to clean what we know is redundant. This could be useful even without doing such a refactoring.

So I made a little list of things that we could potentially get rid of. This is just a tentative trial to start a discussion. Also I am appending an analysis of code coverage for the most common runs (a NBG INITIAL and a HISTORY restart). To open it download the archive and open the CODE_COVERAGE.HTML file. For the ones unfamiliar with the concept of code coverage it's an analysis of the portions of code that where used or not (Yellow color indicates uncovered code block and red color indicates uncovered function). Master is the current master branch while Redux is my development branch where most of the following proposed removals have been implemented.

The master branch has an overall code coverage of 37% for my simulation subset. My feeling is that even including all special cases (like the eq0 run, the different integrators, the bigleaf ecc. the code coverage would still be quite low, indicating a lot of redundancies).

Master_coverage.zip

Redux_coverage.zip

Proposed removal	notes
Heun integrator	~1000 lines, Deprecated in the ED2IN.
Original two stream	~1000 lines, Deprecated in the ED2IN.
Fire model 1	Deprecated in the ED2IN, I am not familiar with this part of the code though.
ASCII meteo input (IMETTYPE)	Deprecated in the ED2IN, I am not familiar with this part of the code though.
Big Leaf	Is anyone using Big Leaf? Do we expect it to be useful? I guess between 500 and 1000 lines
Canopy turbulence duplication	The single precision version of the canopy_struct_dynamic file / subroutines is only called during initialisation. We could simply use the double precision version without any noticeable difference in performance. ~2000 lines of duplication.
Parts of libxml2f90.f90_pp.f90	This file was copied from Alexander Poddey. I think we should only keep the parts that are effectively used and not the whole file. ~1500 lines could go.
therm_lib	Some parts look like they have been copied from somewhere else but are never used. ~4000 lines could go.
therm_lib8	Some parts look like they have been copied from somewhere else but are never used. ~4000 lines could go.
numutils, dateutils, charutils	Some parts look like they have been copied from somewhere else but are never used. ~200 lines.
ed_mem_grid_dim_defs	This looks completely useless. In the PR #200 I removed most of the stuff that was never called, not sure why this survived. Was it added later? Is it useful for some reason?
The EQ0 stuff	This part should probably stay as it is sometimes used for testing, however it's a lot of duplication. Maybe there are more clever ways of dealing with this.
read_10 vs read21	I guess this should also stay because in certain situations it's easier to read in files in the old format. Though we could somehow try to merge the two.
EDTS/exec_test_carvers_serial.sh	This file has also a lot of duplication. I rewrote it to have one single loop that writes all the ED2IN from one template. As such it also avoids to have 3 ED2IN (MAIN, TEST, DBG) with almost the same parameters.
RAPP	I have sometimes used RAPP but I guess it's probably outdated and Marcos suggested to remove it.
BRAMS	Should BRAMS be part of the ED repository? Maybe it could stay as an independent repository

mdietze commented 6 years ago

Could you be more specific about what you mean by 'ASCII dataset'? More generally, I feel like it would be much easier to evaluate these suggesetions if you provided links to the relevant code (Github makes it easy to reference not just the code, but specific lines or blocks).

manfredo89 commented 6 years ago

Hi Mike, I am more or less familiar with every issue on the table except for the 3rd and 4th line. I still added them because they are signaled as deprecated in the ED2IN. I have linked with the file / line number where relevant

mpaiao commented 6 years ago

@mdietze on the ASCII dataset, I think @manfredo89 is referring to variable NL%IMETTYPE in ED2IN. I think a really long time ago we had meteorological drivers in ascii format, but this has been phased out about 10 years ago. I think IMETTYPE could definitely go away, I just checked the code and it is not used anywhere.

A few comments on the other chunks of the code.

Heun integrator: I'm fine removing it, I had implemented this to see if a RK2 would speed up the code, but it turned out to be as slow as RK4 because it ends up taking shorter time steps.
Big leaf: I agree with Paul that we should keep it. It was an important component of Naomi's paper, and it can be useful in theoretical assessments.
Canopy turbulence duplication: I agree this could go away. In reality, it may not be even necessary to call canopy turbulence at all during the initialization, as long as we provide reasonable initial values to all variables involved.
therm_lib/therm_lib8. Some functions may be from EDBRAMS, but they don't need to be in the ED standalone version (coupled EDBRAMS uses the BRAMS therm_lib).
numutils/dateutils/charutils. Same story as therm_lib.
ed_mem_grid_dim_defs: this is used by EDBRAMS. We should move it to BRAMS/src/ed2 instead.
"eq0 stuff" will be removed in my next pull request. Instead of separate subroutines, the code will simply skip the updates of carbon/dbh/height/nplant when vegetation dynamics is turned off.
read_10 vs. read21. These two should definitely remain in the code, and I don't see any easy way of merging them. "read_10" used the pss/css files (ascii) formats, and this is typically how we initialize ED2 from forest inventory (or airborne lidar). "read21" uses hdf5, and unlike full history, it only reads the forest structure.
RAPP: I don't know if anyone else is using it, but I'm fine dropping it. I implemented this a long time ago to generate NCEP-based met drivers, but I would much rather use R-based programs nowadays.
BRAMS: I'd rather keep it for the same reasons why I think we should keep big leaf ED.

mdietze commented 6 years ago

@mpaiao yep, I was around for the pain of the ASCII to HDF5 migration. I agree that the old ASCII met can be dropped. I mostly asked because I recall that there were parsers for other input databases (e.g. soil texture maps) that were never fully translated during the C to Fortran transition and those bits of code might be more valuable to retain (since, for example, there may still be value in being able to load from those).

I agree with all of Marcos' other suggestions

EDmodel / ED2

Code coverage and potential redundancies #226