BioMedIA / irtk-refactored-deprecated

Apache License 2.0
8 stars 7 forks source link

Find modules should mainly require <Pkg>_DIR to be set #36

Open schuhschuh opened 8 years ago

schuhschuh commented 8 years ago

When the CMake find_package command runs in CONFIG mode, it adds a cache variable named <Pkg>_DIR which points to the location of the package configuration file. This works of course only for packages that provide such CMake configuration file.

Same should be the case for custom Find modules. Usually it would be sufficient if a user sets this variable to the prefix where the package is installed. All other paths should be derived from it and when cached, be declared as advanced variables unless a file/directory was not found. As soon as it is set, the variable can be hidden (again) by making it advanced. This reduces the number of non-advanced variables a user is required to set for all dependencies to be properly discovered. It also leaves a more common "experience" for a user, no matter if a custom Find module is used or not. All they have to do is make sure that the "_DIR" variables are set properly.

schuhschuh commented 8 years ago

Environment variables such as MATLABDIR (commonly set on systems where MATLAB is installed) or MATLAB_DIR could also be considered to initialise the <Pkg>_DIR variables.

schuhschuh commented 8 years ago

Here's my comment to #33:

The purpose is to make use of a good practice in how to write Find package modules and to reduce the number of variables and standardise the experience for a user for all packages. As CMake is very flexible and provides little guidelines on how to write a good Find package module, many people have done it differently.

I claim that many Find modules you will find are not very user friendly for those not proficient with the internals/use of CMake. Even CMake's Find modules and find_package itself lack in this respect.

Why do I have to set a bunch of variables for each project (e.g., include path AND location of each single library), just because I installed the package in a non-custom directory ? Why is it not enough to just tell CMake where my installation is located instead ? Next, when I want to change from one installed version to another installed in a different location, why do I have to change all those variables manually. Why is it not enough to change the location of the installation ?

To standardise the finding of dependencies was part of the CMake BASIS project. In my experience with the research software we published, this has made it easier for non-CMake experts to build our code.

For every dependency, I want the user to only have to specify the installation directory via a <Pkg>_DIR variable. All other CMake or the custom Find modules should be able to derive from it. Only if that fails, the users needs to modify the advanced settings. Also, unlike CMake's default behavior, users often fail to realise they have to specify the path to the package Config file, which indeed is much less intuitive then specifying the installation prefix (top-level directory such as /usr/local). This can be solved by setting some CMake variables properly. Done by CMake BASIS.

Using a <Pkg>_DIR variable as main setting is in line with CMake's find_package CONFIG mode. It should be transparent to the user whether a dependency is discovered via Config file or Find module.

Eigen now ships with a CMake config file

I have mentioned before that this doesn't help because I don't even want to configure and install Eigen. It is a header-only library. I doubt the source tree of Eigen already contains a usable CMake package Config file.

I am not convinced that shadowing the behaviour of standard CMake find_package is a good solution.

I don't think it has anything to do with shadowing. It is about having users not to be familiar with CMake and it's find_package command behavior. And as mentioned above, with a simple variable setting, the <Pkg>_DIR variable can be set to the installation directory.

jopasserat commented 8 years ago

Maybe it would be good to take advantage of this refactoring period to use Basis in the IRTK build process.

@schuhschuh do you think you could give us a crashcourse on that / help with the migration? I guess you're already using it in your fork/branches?

schuhschuh commented 8 years ago

I have started to work on this with Stefan in his GitLab repository, but it has never reached a point where all my development could take place. I haven't done it myself because it is only worth the effort if there is commitment towards using CMake BASIS for the official IRTK version.

As from what I see, it would have a few advantages:

And, features I would more likely take time to finally realise when actually used:

Another feature of CMake BASIS which may be controversial and need not be adopted:

ghisvail commented 8 years ago

@jopasserat @schuhschuh I am happy to discuss usage of BASIS at some point but I feel we are drifting away from the initial issue that Andreas is raising, which is to use <Pkg>_DIR consistently in our find modules.

My answer is no for the following reasons:

As CMake is very flexible and provides little guidelines on how to write a good Find package module, many people have done it differently.

  • And I don't intend to maintain a diff for our external files due to a short-coming of CMake. Plus, if you can install a library in a non-standard location, there is a high chance you can figure out how to pass that location to the relevant cmake find package. Not worth maintaining a diff with the upstream files for that matter.

Why is it not enough to just tell CMake where my installation is located instead ?

  • Because the location is subject to interpretation. A header-only library like Eigen? header+libs like NifTICLib? header + optional libs like Boost? Plus the range of different prefixes for the header and libs locations...

I can understand why CMake did not bother trying to formalize this, they can't go on a quest to fix the install scripts of the whole dev community. And neither would I. pkgconfig was supposed to be the solution to this and its adoption is still low.

I apologize for the dry tone but I really feel we are bikeshedding here.

schuhschuh commented 8 years ago

Because the location is subject to interpretation. A header-only library like Eigen? header+libs like NifTICLib? header + optional libs like Boost? Plus the range of different prefixes for the header and libs locations...

You're missing the point. It is not necessary that all packages follow the same installation scheme. It is about the Find modules written specifically for a package that should encapsulate that knowledge.

The most common location every software has in common is the installation prefix (i.e. top-level directory of the installation).

The variable _DIR is used by config-mode [only]...

At least for CMake before version 3, this definitely was not the case. Many of the Find modules used such variable themselves.

...and is expected to point to the location of the cmake config files. Changing its behaviour for non config-mode is counter-intuitive.

I couldn't disagree more. First of all, you assume that a user knows that such variable is only meant to be a path where a Config.cmake file is located. A user shouldn't know such thing, which isn't even true. CMake has no such mechanism of requiring that a variable of such name is only used like this.

I apologize for the dry tone but I really feel we are bikeshedding here.

Sure, one can consider it petty details. But it's the petty details for something as a project structure and build configuration how easy and reliable it is to build it from source code for non-software developers. IMHO the important aspect of packaging (the refactored) IRTK was it's build configuration (i.e., CMake stuff). I know it's not the most exciting for a software developer, but it can make a real difference for the user.

Look at the current setup. When I configure IRTK, I am overwhelmed with dozens of CMake variables in my CMake GUI, hiding all the actually relevant things. Instead of a single variable for each package that is intuitive (not counter-intuitive as you say). Why would you think as somebody not using CMake for your own development that "IRTK_DIR" is supposed to point to "/usr/local/lib/cmake/IRTK" instead of "/usr/local", the CMAKE_INSTALL_PREFIX I had used to install it ? Why do I, as a user that tries to install some software have to know where each dependency hides it's CMake package Config file ? Or let's say I install software in /opt/<pkg>-<version> which is in accordance with the Linux FHS. Then wouldn't it be more intuitive to set <Pkg>_DIR to /opt/<pkg>-<version> ? I don't even want to know where the <Pkg>Config.cmake file is located within this directory tree. This I consider knowledge only the developer of that package or the software that uses it needs to know.

schuhschuh commented 8 years ago

The variable _DIR is used by config-mode and is expected to point to the location of the cmake config files. Changing its behaviour for non config-mode is counter-intuitive.

That this statement is inexact and CMake itself is not consistent with respect to this can be easily seen from the following text you can look up in the documentation of find_package:

The set of installation prefixes is constructed using the following steps. [...]

1. [...]

2. Search paths specified in cmake-specific environment variables. [...]

<package>_DIR
[...]

From this you should conclude that CMake interprets an environment variable such as IRTK_DIR to be the installation prefix (e.g., /opt/irtk-2.0.0) of the IRTK package, not the location of it's CMake Config file (e.g., /opt/irtk-2.0.0/lib/cmake/IRTK).

In CMake BASIS, the use is consistent and intuitive for non-CMake users. Still, those familiar with CMake can as well point <Pkg>_DIR directly to the location of the package Config file.

ghisvail commented 8 years ago

You have me lost complete attention to your points on the following occasions:

Sure, one can consider it petty details.

My point, priority should be put elsewhere.

A user shouldn't know such thing.

He or she still has to know where this variable should point to regardless of the naming convention you are discussing.

I know it's not the most exciting for a software developer

I am not sure what this implies and I am not certain I want to know. Moving on.

When I configure IRTK, I am overwhelmed with dozens of CMake variables in my CMake GUI

So this is really what this issue is all about ? The number of variables is not pleasing to your eyes ?

And last but definitely not least the piece of documentation you decided to partially quote. This whole section is meant to highlight usage of inhibition variables to let the user tweak the search paths for CMake config files / find modules:

The set of installation prefixes is constructed using the following steps. If NO_DEFAULT_PATH is specified all NO_* options are enabled.

An then each step highlighting which specific variable should be enabled:

1. Search paths specified in cmake-specific cache variables. These are intended to be used on the command line with a -DVAR=value. This can be skipped if NO_CMAKE_PATH is passed:

[...]

2. Search paths specified in cmake-specific environment variables. These are intended to be set in the user’s shell configuration. This can be skipped if NO_CMAKE_ENVIRONMENT_PATH is passed:

[...]

The relevant piece of documentation you should be quoting is at the beginning and is the following:

Config mode attempts to locate a configuration file provided by the package to be found. A cache entry called <package>_DIR is created to hold the directory containing the file.

I have no further interest in discussing this issue and would appreciate if contribution efforts were directed elsewhere. Thanks.

schuhschuh commented 8 years ago

So this is really what this issue is all about ? The number of variables is not pleasing to your eyes ?

Don't come like this. It's not about me or how pleasing I find it. Keep the user in mind. I have made some experience with users with publicly available research software as well as other researchers in two labs now. I am certain they would greatly appreciate if we would take such issue more serious.

And last but definitely not least the piece of documentation you decided to partially quote.

Of course I post only the relevant part, to show you that <Pkg>_DIR variables are not exclusive to point to the location of a Config file. I've shown this and that's all I wanted.

if contribution efforts were directed elsewhere

Fine. I might else well direct them towards my own fork.