EDmodel / ED2

Ecosystem Demography Model
78 stars 112 forks source link

Code coverage and potential redundancies #226

Open manfredo89 opened 6 years ago

manfredo89 commented 6 years ago

Since starting to work with ED I always felt like a priority to try simplifying it (especially to make it more accessible to colleagues that are not willing to dedicate their entire PhD to modelling). Some time ago @DanielNScott wrote a wiki page about a possible avenue to modernize the code https://github.com/EDmodel/ED2/wiki/Updating-to-F03-F08,-Iterators-for-Data-Structures. My feeling is that if we want to port the code to a more modular (perhaps even object oriented) design, the first step is to clean what we know is redundant. This could be useful even without doing such a refactoring.

So I made a little list of things that we could potentially get rid of. This is just a tentative trial to start a discussion. Also I am appending an analysis of code coverage for the most common runs (a NBG INITIAL and a HISTORY restart). To open it download the archive and open the CODE_COVERAGE.HTML file. For the ones unfamiliar with the concept of code coverage it's an analysis of the portions of code that where used or not (Yellow color indicates uncovered code block and red color indicates uncovered function). Master is the current master branch while Redux is my development branch where most of the following proposed removals have been implemented.

The master branch has an overall code coverage of 37% for my simulation subset. My feeling is that even including all special cases (like the eq0 run, the different integrators, the bigleaf ecc. the code coverage would still be quite low, indicating a lot of redundancies).

Master_coverage.zip

Redux_coverage.zip

Proposed removal notes
Heun integrator ~1000 lines, Deprecated in the ED2IN.
Original two stream ~1000 lines, Deprecated in the ED2IN.
Fire model 1 Deprecated in the ED2IN, I am not familiar with this part of the code though.
ASCII meteo input (IMETTYPE) Deprecated in the ED2IN, I am not familiar with this part of the code though.
Big Leaf Is anyone using Big Leaf? Do we expect it to be useful? I guess between 500 and 1000 lines
Canopy turbulence duplication The single precision version of the canopy_struct_dynamic file / subroutines is only called during initialisation. We could simply use the double precision version without any noticeable difference in performance. ~2000 lines of duplication.
Parts of libxml2f90.f90_pp.f90 This file was copied from Alexander Poddey. I think we should only keep the parts that are effectively used and not the whole file. ~1500 lines could go.
therm_lib Some parts look like they have been copied from somewhere else but are never used. ~4000 lines could go.
therm_lib8 Some parts look like they have been copied from somewhere else but are never used. ~4000 lines could go.
numutils, dateutils, charutils Some parts look like they have been copied from somewhere else but are never used. ~200 lines.
ed_mem_grid_dim_defs This looks completely useless. In the PR #200 I removed most of the stuff that was never called, not sure why this survived. Was it added later? Is it useful for some reason?
The EQ0 stuff This part should probably stay as it is sometimes used for testing, however it's a lot of duplication. Maybe there are more clever ways of dealing with this.
read_10 vs read21 I guess this should also stay because in certain situations it's easier to read in files in the old format. Though we could somehow try to merge the two.
EDTS/exec_test_carvers_serial.sh This file has also a lot of duplication. I rewrote it to have one single loop that writes all the ED2IN from one template. As such it also avoids to have 3 ED2IN (MAIN, TEST, DBG) with almost the same parameters.
RAPP I have sometimes used RAPP but I guess it's probably outdated and Marcos suggested to remove it.
BRAMS Should BRAMS be part of the ED repository? Maybe it could stay as an independent repository
mdietze commented 6 years ago

Could you be more specific about what you mean by 'ASCII dataset'? More generally, I feel like it would be much easier to evaluate these suggesetions if you provided links to the relevant code (Github makes it easy to reference not just the code, but specific lines or blocks).

manfredo89 commented 6 years ago

Hi Mike, I am more or less familiar with every issue on the table except for the 3rd and 4th line. I still added them because they are signaled as deprecated in the ED2IN. I have linked with the file / line number where relevant

mpaiao commented 6 years ago

@mdietze on the ASCII dataset, I think @manfredo89 is referring to variable NL%IMETTYPE in ED2IN. I think a really long time ago we had meteorological drivers in ascii format, but this has been phased out about 10 years ago. I think IMETTYPE could definitely go away, I just checked the code and it is not used anywhere.

A few comments on the other chunks of the code.

mdietze commented 6 years ago

@mpaiao yep, I was around for the pain of the ASCII to HDF5 migration. I agree that the old ASCII met can be dropped. I mostly asked because I recall that there were parsers for other input databases (e.g. soil texture maps) that were never fully translated during the C to Fortran transition and those bits of code might be more valuable to retain (since, for example, there may still be value in being able to load from those).

I agree with all of Marcos' other suggestions