cmake-basis / legacy

Legacy CMake BASIS project for versions 3.2 and older. For newer versions, go to
https://github.com/cmake-basis/BASIS
Other
13 stars 11 forks source link

Add external directory in filesystem standard #327

Closed ahundt closed 8 years ago

ahundt commented 10 years ago

There is not an "external" directory in the filesystem standard. Modules is for BASIS components but we probably should consider a directory specifically for 3rd party components.

http://opensource.andreasschuh.com/cmake-basis/standard/fhs.html

schuhschuh commented 10 years ago

My standard was to disallow any such "external" directory because I generally think it is a bad practice.

When you include the source code of external dependencies inside your repository, you will have to update them if newer versions are out, make sure to mention the proper license or that you are allowed to include the source code, will unnecessarily increase the size of your repository, will make a difference between "light" dependencies which you will add to the external directory, but other more "heavy" dependencies like OpenCV, ITK, VTK, and so on which are in fact bigger then your actual project and thus you would not copy them to the external directory... thus, again you would not really have all the dependencies in your package because the overhead for the convenience would just not justify the duplication and size increase of your repository/package.

I believe there are better ways to make the build of your software convenient and ensure that the needed dependencies are available to the user than including their source code in your project repository.

Any external dependency should only be specified as dependency of a package and the source code of the external package not be part of a project's repository. Instead, if you want to make sure that the source package of an external dependency will always be available, especially if you require a particular older version, these external packages can be kept track of in a separate repository. The super-build approach would then be utilized to ease the build of a package which also builds any dependency if not present on the target system. If you don't want the download of the external dependencies during build/installation, you can just include the .tar.gz files of the external dependencies in the source distribution package that is provided and have the super-build script make use of these if present. An example is the super-build script of the DRAMMS package of SBIA. Of course the directory in which you would copy the .tar.gz (or other archive format) files of the external dependencies could now be called external and only exist in the distribution package of your package (i.e., created by CPack's make package_source) but not be part of the repository. With Git submodules, you could of course consider having these within an external directory given that the dependency maintainer also uses Git and so forth. I would yet prefer simply using the official distribution packages of any external dependency which I then just use as part of the super-build.

ahundt commented 10 years ago

Yes, I agree with that, with a few caveats.

In the cases I've seen we make our toplevel a light tracking repository with one or two files and everything else a mercurial subrepository, including committed externals. Nonetheless, they still go in an external directory.

In CMakeMesh I also make it possible to explicitly select the system or included versions.

I've also avoided downloads in super-build scripts because they slow the build down a lot and break when the internet is not available.

Slight version differences also break user code in subtle and unexpected ways, so including a tested and working version also prevents that issue.

Side question: Are there any caveats that come with specifying super-build dependencies in the BasisProject.cmake file?

schuhschuh commented 10 years ago

Slight version differences also break user code in subtle and unexpected ways, so including a tested and working version also prevents that issue.

I understand, but as mentioned before these should only be included in a distribution package, not the source code repository. You can then provide two different source distribution packages, e.g., a <package>-<version>.tar.gz and a <package>-<version>-with-prerequisites.tar.gz. The first would always download the dependencies if not available, while the latter includes also the (default version of the) distribution packages of the dependencies and thus will be bigger in size but include everything needed to build the software.

ahundt commented 10 years ago

The first would always download the dependencies

Unfortunately, I cannot rely on this being possible.

I also need to focus on distributing the original repository without zipped packages as well.

schuhschuh commented 10 years ago

The first would always download the dependencies

Unfortunately, I cannot rely on this being possible.

And that is alright, that is what the second option would be for. This first smaller package would be for those that anyway have already most of the dependencies installed and thus would just waste time, space, and bandwidth downloading the whole thing which includes also the dependencies.

distributing the original repository

I would never consider this "distributing". It's more like making the repository publicly accessible. The same way you could make the repository with the dependencies accessible and then people would still always be able to download all the code. "Distributing repositories" is another bad practice that seems to have emerged with Git. If someone wants to use the development versions directly, they obviously must be more familiar with software development and thus know how to make sure that all dependencies are installed in order.

ahundt commented 10 years ago

Those are both reasonable opinions.

It seems others believe distributing repositories isn't such a bad practice. For instance, my group has adopted such a practice since we want to ensure setups are exactly alike without the possibility of versions being out of sync. One of the huge benefits of including subrepositories is that external versions are automatically updated for everyone when one person does the update. We also have a controlled environment where "distribution" consists of developers and other experts installing software directly on to the product hardware. Thus repository distribution is extremely useful and simplifies matters dramatically in our case.

Since there are reasonable use cases for both distributing packages and repositories, I believe it is reasonable to support both methods.

ahundt commented 10 years ago

Also note that this is common practice for end user projects, rather than libraries. For example, this is followed by the Chromium and Firefox repositories.

schuhschuh commented 10 years ago

we want to ensure setups are exactly alike without the possibility of versions being out of sync

How would this not be the case when including a particular version of an external dependency as .tar.gz in your distribution package ?

automatically updated for everyone when one person does the update

The equivalent of my workflow would be either one of the following

Hence, I see absolutely no benefit. All the benefits you named are also possible without Git submodules. Another benefit of my preferred way to deal with external dependencies is, however, that you don't waste resources and time copying just everything from the external repositories like previous versions and the history of changes that you simply don't need. Also, you (and everyone else) depend on the external repository as long as you didn't clone it yourself. Instead you can just copy the distribution packages and store them on your download server, respectively include them in your complete "bundle" (as I termed them, i.e., a super-build which includes also all or most of the external source packages).

Since there are reasonable use cases for both distributing packages and repositories, I believe it is reasonable to support both methods.

Alright, but yet BASIS should advocate people to do the right thing and make it as convenient as their bad habits so they have no excuses for doing it wrong just for the sake of simplicity. I am fine using Git submodules for external dependencies as long as the project itself is only a meta-project, i.e., one that itself does not contain any source code but only sub-projects. That meta-project represents the distributed software package and all the configuration/prerequisites needed to build it. I think you mentioned such setup in your group and that is ok. I just don't want to add the concept of external directories to any project. It should be clear that these should only be part of a meta-project / bundle / package or whatever we want to call it, but not the actual software project which contains the important source code.

A software project (respository, source tree) which actually implements (part of a) software should never include any external dependencies. Instead it just has to clearly state what these are. The meta-project can then integrate the different components of the software along with the dependencies.

ahundt commented 10 years ago

How would this not be the case when including a particular version of an external dependency as .tar.gz in your distribution package ?

tar.gz leaves the possibility of picking up an old version and is not directly managed by version control. When we do hg checkout XXXX we want to be certain the entire platform and all dependencies have moved to the exact version we used when that commit was made. We are using version control as a one step complete version control solution.

By installing a tar.gz that also means there are two versions of headers on a machine, the installed and development version. If the wrong one is picked up it may build successfully but have crazy and unexpected behavior. Here are more details:

https://stackoverflow.com/questions/21489999/how-to-prevent-accidentally-including-old-headers

This is a key request of my group and I need to fulfill it.

A software project (respository, source tree) which actually implements (part of a) software should never include any external dependencies. Instead it just has to clearly state what these are. The meta-project can then integrate the different components of the software along with the dependencies.

I agree with the sentiment, but I've found from experience that these absolutes just aren't practical and all they do is cause other developers to get upset and forge their own path. We have several extensions committed directly to the BASIS tree, so we are violating this principal ourselves. As I mentioned, counter examples like Chromium and Firefox demonstrate that this is extremely common and well accepted practice, because the risk of slight version conflicts is quite high as the sources of packages can vary greatly even among linux platforms with patches, in addition to windows and OS X specific patches and others.

schuhschuh commented 10 years ago

But all of this would be taken care of by a super-build script which does know which particular version/checkout of a remote repository to use and which patches need to be applied. The problem with the include paths is also solved by the super-build script making sure that the include path is setup properly. Let this super-build be the build configuration of your top-level project and you really don't loose any of the benefits that you mentioned.

BASIS does include core third-party libraries which are meant to be part of the standard. It also modifies and extends these, like the TCLAP library. Every BASIS project that follows this standard would make use of these and therefore all of them will depend on these. These libraries are further not dependencies of BASIS and not included for that reason, but because they should be considered components of it which it wants you to use in your software.

ahundt commented 10 years ago

BASIS does include core third-party libraries which are meant to be part of the standard. It also modifies and extends these, like the TCLAP library. Every BASIS project that follows this standard would make use of these and therefore all of them will depend on these. These libraries are further not dependencies of BASIS and not included for that reason, but because they should be considered components of it which it wants you to use in your software.

Yes, and this is exactly what we need to do with some of our projects.

But all of this would be taken care of by a super-build script which does know which particular version/checkout of a remote repository to use and which patches need to be applied. The problem with the include paths is also solved by the super-build script making sure that the include path is setup properly. Let this super-build be the build configuration of your top-level project and you really don't loose any of the benefits that you mentioned.

Super-build has problems, like making sure all of the compilation flags are correct and consistent throughout. Once we have integrated support in BASIS and clear instructions I could see that being a viable route but we aren't there yet. Our application is extremely performance sensitive so these details matter a lot more than on a typical project. I expect that medical imaging stuff, for example, can just be run overnight and get the results. We can't do that because we have realtime requirements so we have to be very careful about our build flags and configuration.

schuhschuh commented 10 years ago

We can't do that because we have realtime requirements so we have to be very careful about our build flags and configuration.

But would that not just be to ensure that all dependencies are build with the same flags ? I suppose that is what you get without additional effort when you just include the targets in the same build configuration. If you needed individual flags for the packages, then the super build would make the separation cleaner. If not, passing build flags on to ExternalProject_Add should be straightforward as you assume anyway that CMake is used by all of the dependencies. Otherwise you could not include them as "modules" in the first place.

ahundt commented 10 years ago

That cuts to the heart of the superbuild matter, adding a superbuild module takes a lot of lines of code and isn't as easy to keep cross-platform and use in an IDE. Adding a module takes one or two lines of code and everything shows up nicely in an IDE. Superbuilds also cannot be parallelized as easily. These are legitimate hurdles that must be overcome before it makes sense to use superbuild over modules.

Back to the topic of this issue and an external directory. There are good reasons to have one even if it is compiled with a superbuild, but since we can now specify custom directories easily there is a workaround, per issue #311, if you don't want it to be directly built into BASIS.

schuhschuh commented 10 years ago

if you don't want it to be directly built into BASIS

I want it to be (officially) only allowed for projects which do not have any PROJECT_CODE_DIR. This requirement may just be one of the "standard" documentation and not a strict requirement enforced by the CMake functions. The external directory would then be a place for basis_find_package to look for external dependencies first. This can be supported by BASIS.

ahundt commented 10 years ago

That sounds reasonable to detail that as the official correct and advised way to do things in the docs as long as it does not programmatically prevent an alternative choice.