PacificCommunity / seapodym-codebase

SEAPODYM source code, tools, documentations and example configurations
Other
5 stars 3 forks source link

Conversion to CMake project #3

Open Ash12H opened 1 year ago

Ash12H commented 1 year ago

Hi, I am currently working on an adaptation of the seapodym-codebase project which uses Make to allow the use of CMake. Can I send you a PullRequest when it's ready?

The main changes are:

What is needed:

If you agree, please let me know. :smile:

Jules Lehodey

Ash12H commented 1 year ago

I just created a branch named dev-cmake where I pushed all the changes. You can see how it looks like. For the moment the documentation has not changed but in my own fork I have already written a file INSTALL.md.

Quick install with CMake :

Follow the build+install process of ADMB (into /usr/local/admb) then change the name of the library you want to use to "libadmbo.so" or "libadmbo.a". In the seapodym-codebase folder execute mkdir build | cd build | cmake .. | make

If you want to specify an ADMB build directory use cmake .. -DCMAKE_PREFIX_PATH=/path/to/admb/build/ instead.

innasenina commented 1 year ago

Hi Jules,

Thanks for working on this contribution. I propose to have a small discussion before proceeding to the pull request. Indeed, building all binaries at once will be very convenient. Also, I agree that once it is possible to modify the ADMB library name in one file, there is no need to create environmental variable in user’s bashrc. Although I don’t see why it is better to avoid this practice.

With respect to the use of CMake, changing the project structure and adding dependency aren’t small changes. In the short term, they’ll require me changing the typical and convenient workflow. Thus, I wonder -- what other benefits this change brings except an ability of building all executables at once? Important to note that cmake is not installed with Linux systems by default, so the user without sudo rights or having some other environment restrictions may have difficulty to compile the code. For example, I tried two linux machines here (condor submission and mahi-mahi), and in both had the same problem: ‘command cmake not found’. On the other hand, current setup (as described in README.md) allowed me quickly installing seapodym from github on three different Linux platforms here, NeSI included. As such, perhaps keeping the possibility to use Makefile would provide more flexibility.

Also, a small question on the use of cmake. After the installation of cmake, and doing as advised I had the compilation problem:

: mkdir build : cd build : cmake ~/seapodym-codebase/src/ : make [ 0%] Building CXX object class/CParam/CMakeFiles/cparam_dv.dir/Param.o ~/seapodym-codebase/src/class/CParam/Param.cpp:5:10: fatal error: Param.h: No such file or directory 5 | #include "Param.h"

The error persisted after I tried to provide 'src' and 'include' locations to the cmake command. In the automatically created Makefile the source directory was correct, but there were no headers. What is wrong with the above commands?

Cheers, Inna

Ash12H commented 1 year ago

Hi Inna, Here are some answers from my opinion:

Also, I agree that once it is possible to modify the ADMB library name in one file, there is no need to create environmental variable in user’s bashrc.

Since C++ has many variations in its usage, I don't know if the use of a bashrc variable is a standard. To avoid conflicts and to be as clear as possible, using a case-by-case process will allow SEAPODYM to be used by as many people as possible. Indeed, the use of a bashrc file, which is exclusive to the linux system, makes it non-portable to other OS without prior modification. The real conflict comes, in my opinion, from the way ADMB compiles and installs its library. The installation process should move the lib and include files to /usr/local/ rather than /usr/local/admb/. Also, the libraries should have a standard name rather than a name that contains system information, compiler name and version. Anyway, once all this is known, the user can compile ADMB and use it for his own purposes.

With respect to the use of CMake, changing the project structure and adding dependency aren’t small changes. In the short term, they’ll require me changing the typical and convenient workflow. Thus, I wonder -- what other benefits this change brings except an ability of building all executables at once?

I understand that this is a major change to the current SEAPODYM workflow. The first goal was to group the classes and add meta information to the new users. This way it is easier to understand how the whole project is structured. Secondly, there was a conflict when the user wanted to generate "flux" in addition to other executable files due to ambiguous behavior (naming of functions). Anyway, these changes are not specific to CMake. Among Make, CMake and Autotools, I think CMake is the most popular tool for C++ compilation. The reason is that it compiles on multiple platforms and is also integrated with most popular IDEs. While Make requires the user to fully describe the environment variables and the project architecture, CMake provides many automated processes.

Important to note that cmake is not installed with Linux systems by default, so the user without sudo rights or having some other environment restrictions may have difficulty to compile the code.

You are right, CMake is not installed by default (note that sometimes make is not installed either). And in that case virtual environment can be used when it is possible (cf. anaconda, conan, VCPKG, Docker etc...).

As such, perhaps keeping the possibility to use Makefile would provide more flexibility.

Yes, this could be a good way to stay portable for all use cases. However, both versions would have to be maintained at the same time. Although I'm not sure it will take much effort in the long run, at first the new changes will heavily impact the compilation by Make.

Also, a small question on the use of cmake. After the installation of cmake, and doing as advised I had the compilation problem:

mkdir build
cd build
cmake ~/seapodym-codebase/src/
make

You should use :

mkdir build
cd build
cmake ~/seapodym-codebase
make

Because you must point to initial CMakeList.txt file which is at the root of the directory. This one is setting all project parameters while the one in src directory is only compiling libraries and binaries.


Remarks :

Cheers, Jules

Ash12H commented 1 year ago

Quick update

innasenina commented 1 year ago

Hi Jules,

In response to your suggestions of changes, here are the answers and some counter-propositions that aim to integrate your work on the codebase enhancement while focusing on improvements of this code to facilitate its further evolution towards a better numerical model and a quantitative tool for tuna management. These further evolutions include multi-threading, integrating early-life stage data, introducing variable growth.

  1. Change of the SEAPODYM source code structure

If we want to change the current code structure, we should first see whether it is necessary. There are three issues with the proposed structure: 1) it leaves multiple files unclassified, so without any meta information, 2) the folder structure by class does not help understanding how the numerical model code is structured, and 3) it is not necessary for this reason as the view by classes can be achieved through existing development environments.

Here is a version of the codebase structure that is more relevant to the numerical model software, allows classification of entire codebase, provides meta-information, and has a potential for “growing”. Basically, it splits the code into

1) the models – all code that computes implemented models (full population dynamics, tag movement model, habitats and fluxes) with further structure into model functions, parameters, solver and control (main loops with calls of executing functions); 2) the data (model state vector and other variables, input-output, fisheries and tagging data) is handled, and 3) various utilities that enable dimensions handling, spatial and temporal aggregations, the use of basic mathematical functions, macros and homemade types.

.src/ ├── data │   ├── like.cpp │   ├── Matrices.cpp │   ├── ReadWrite_DYM.cpp │   ├── ReadWrite_fisheries.cpp │   ├── ReadWrite_TXT.cpp │   ├── SeapodymCoupled_OnReadForcing.cpp │   ├── SeapodymCoupled_OnWriteOutput.cpp │   └── SeapodymCoupled_ReadTags.cpp ├── models │   ├── control │   │   ├── ad_buffers.cpp │   │   ├── hessian.cpp │   │   ├── main.cpp │   │   ├── main_densities.cpp │   │   ├── main_habitats.cpp │   │   ├── main_simulation.cpp │   │   ├── main_threaded.cpp │   │   ├── seapodym_coupled.cpp │   │   ├── SeapodymCoupled_EditRunCoupled.cpp │   │   ├── SeapodymCoupled_Forage.cpp │   │   ├── SeapodymCoupled_Funcs.cpp │   │   ├── SeapodymCoupled_OnCompFluxes.cpp │   │   ├── SeapodymCoupled_OnRunCoupled.cpp │   │   ├── SeapodymCoupled_OnRunFirstStep.cpp │   │   ├── SeapodymCoupled_SaveBeforeFishing.cpp │   │   ├── seapodym_density.cpp │   │   ├── seapodym_habitat.cpp │   │   ├── SeapodymHabitat_OnRunFirstStep.cpp │   │   ├── Seapodym_OnRunDensity.cpp │   │   ├── Seapodym_OnRunHabitat.cpp │   │   └── seapodym_threaded.cpp │   ├── functions │   │   ├── accessibility.cpp │   │   ├── dv_accessibility.cpp │   │   ├── dv_feeding_habitat.cpp │   │   ├── dv_food_requirement_index.cpp │   │   ├── dv_juvenile_habitat.cpp │   │   ├── dv_mortality_sp.cpp │   │   ├── dv_predicted_catch.cpp │   │   ├── dv_predicted_catch_without_effort.cpp │   │   ├── dv_seasonal_switch.cpp │   │   ├── dv_selectivity.cpp │   │   ├── dv_spawning.cpp │   │   ├── dv_spawning_habitat.cpp │   │   ├── dv_survival.cpp │   │   ├── dv_total_exploited_biomass.cpp │   │   ├── dv_total_mortality_comp.cpp │   │   ├── dv_total_obs_catch_age.cpp │   │   ├── dv_total_pop.cpp │   │   ├── fd_accessibility.cpp │   │   ├── fd_feeding_habitat.cpp │   │   ├── fd_food_requirement_index.cpp │   │   ├── fd_juvenile_habitat.cpp │   │   ├── fd_mortality_sp.cpp │   │   ├── fd_predicted_catch.cpp │   │   ├── fd_predicted_catch_without_effort.cpp │   │   ├── fd_seasonal_switch.cpp │   │   ├── fd_spawning.cpp │   │   ├── fd_spawning_habitat.cpp │   │   ├── fd_survival.cpp │   │   ├── fd_total_exploited_biomass.cpp │   │   ├── fd_total_mortality_comp.cpp │   │   ├── fd_total_obs_catch_age.cpp │   │   ├── fd_total_pop.cpp │   │   ├── feeding_habitat.cpp │   │   ├── food_requirement_index.cpp │   │   ├── juvenile_habitat.cpp │   │   ├── mortality_sp.cpp │   │   ├── predicted_catch.cpp │   │   ├── predicted_catch_without_effort.cpp │   │   ├── seasonal_switch.cpp │   │   ├── SimtunaFunc.cpp │   │   ├── spawning.cpp │   │   ├── spawning_habitat.cpp │   │   ├── total_exploited_biomass.cpp │   │   ├── total_mortality_comp.cpp │   │   └── total_obs_catch_age.cpp │   ├── parameters │   │   ├── Param.cpp │   │   ├── VarParamCoupled.cpp │   │   ├── VarParamCoupled_reset.cpp │   │   └── VarParamCoupled_xinit.cpp │   └── solver │   ├── caldia.cpp │   ├── Calpop_caldia.cpp │   ├── Calpop_calrec.cpp │   ├── Calpop_InitCalPop.cpp │   ├── Calpop_precaldia.cpp │   ├── Calpop_precalrec.cpp │   ├── Calpop_recompute_coefs.cpp │   ├── Calpop_tridag.cpp │   ├── calrec_adre.cpp │   ├── calrec_precalrec.cpp │   ├── dv_caldia.cpp │   ├── dv_calrec_adre.cpp │   ├── dv_calrec_precalrec.cpp │   ├── dv_precalrec_juv.cpp │   ├── dv_tridag_bet.cpp │   ├── fd_caldia.cpp │   ├── fd_calrec_adre.cpp │   ├── fd_calrec_precalrec.cpp │   ├── fd_precalrec_juv.cpp │   ├── fd_tridag_bet.cpp │   ├── Map.cpp │   ├── precalrec_juv.cpp │   └── tridag_bet.cpp └── utilities ├── Date.cpp ├── Numfunc.cpp ├── SaveTimeArea.cpp ├── SeapodymDocConsole_UpdateDisplay.cpp └── XMLDocument.cpp

7 directories, 106 files

Note, all headers are moved to the ‘include’ in line with your suggestion. I set up the proposed structure in the ‘development’ branch and suggest that we all test it and agree if it is more convenient before committing these changes to ‘master’.

  1. The use of Cmake

Given existing issues with unavailability of cmake on our development computers, I suggest not replacing make by cmake. We can compare and decide later whether one or another solution is the best. Could you please add Cmakefile(s) to the ‘development’ branch as well?

  1. The seapodym_fluxes application

First, the simulation study to compute biomass flow rates between regions should run the same reference model, numerical resolution of which is controlled by OnRunCoupled function. It is our main function with the model input-output control, time control, life stage and age loops with calls for numerical solver. It is true that the file SeapodymCoupled_OnCompFluxes.cpp has the function with the same name ‘OnRunCoupled’. However, it was done so on purpose in order to re-use the ‘main’ and ‘seapodym_coupled’ functions unchanged, so it does not lead to the code duplication. And it does not create a conflict on sequential code compilation.

Of course, it is possible to implement this differently, but better doing it by minimizing the code duplication and not by duplicating it further. Currently, the two functions are different only in two places: in the IO and in additional calls for regional models resolutions. Rewriting of the former, so we can end up with only one OnRunCoupled, is straightforward, but the latter requires a deeper code revision. So, for now I suggest leaving it as is.

  1. For the INSTALL.md and the code formatting, let’s get back to it later when we’ll be updating the ‘master’ branch with the above changes.

Cheers, Inna

fabricebouye commented 1 year ago

My quick 2 cents on a couple of points:

  1. While it’s true that very large C++ project do use multi-directory structure for source storage code and separation, it’s not quite often that I see that for projects of smaller scale such as this model. This is not impossible though.

This leads to 2 other alternatives or additional considerations (as those are not mutually exclusive and they all can be combined together):

 *   Better file naming maybe. Though there is already some kind of naming scheme judging from the file list sent by Inna, there are still some files (older code?) that appear to be exempted from it. Again this might not help understanding the structure of the numerical model.

Note: MFCL is way worse than Seapodym in this case.

 *   Using namespace! If having “physical” source code separation is an issue, what about having a logical one instead? Also it might make the doxygen doc easier to navigate as I just had a quick look at the source code and it seems to me that all the classes and functions are laying out flat in the nameless naming space… Unlike Java, C++ does not force you to have specific folders with the exact same name for each of your packages/naming spaces, just saying…

Also though it may not necessarily be right for numerical / scientific purpose there are some kind of standard/often used ways/names to name sub-naming spaces. But it usually varies from one vendor/”programming world” to another. You may want to have a look at the C++ API, .NET API, or even the Java API to get inspired. Is there any large OpenSource science C++ project around you may get inspiration from (may also help if you decide to go the directory separation route), and I am not talking about ADMB (something more fresh, using recent versions of C++ and programming concepts).

  1. If cmake needs to be installed on the machine I administrate, do not hesitate to ask me about it.