easybuilders / easybuild-framework

EasyBuild is a software installation framework in Python that allows you to install software in a structured and robust way.
https://easybuild.io
GNU General Public License v2.0
152 stars 203 forks source link

Introduce a notion of toolchain-neutral software #570

Open gribozavr opened 11 years ago

gribozavr commented 11 years ago

It does not have a lot of sense to build many variants of tools that don't have a public ABI. For example: CMake, subversion, git, mercurial. These packages can be built once and used with any toolchain.

Currently there is a 'dummy-dummy' toolchain that allows to do this if no other package depends on such toolchain-neutral software. If there is a dependency, a special easyconfig should be created and built.

A simple solution is to allow easyconfigs to be marked toolchain-neutral, and to allow these packages to satisfy dependencies of software that is being built with a non-dummy toolchain.

boegel commented 11 years ago

There are a couple of caveats here, mainly the compilers/libs used to build this toolchain-neutral (tc-n) software.

To avoid problems when a certain toolchain is used in combination with the tc-n software, the latter has to be fully statically linked.

The request seems valuable though...

gribozavr commented 11 years ago

Or it can be built with rpath.

fgeorgatos commented 11 years ago

fyi. xbesseron has been suggesting/promoting the idea to go into the direction of this issue, too.

If we do so, I think I'd favor to promote static linking.

Why? dynamic linking can be a complicated business in itself, even when using rpath: http://gcc.gnu.org/ml/gcc-help/2008-06/msg00118.html # "8 ways to leave your linker" (after all, rpath just confines directories, not individual library versions with functions!).

ie. my concern is that the dynamic aspect may become an annoyance...

gribozavr commented 11 years ago

We are talking about linking against OS-provided versions of libraries. These libraries should have a stable ABI, so I don't see an issue here yet.

stdweird commented 11 years ago

imho toolchain-neutral software is only needed for build requirements of (sub)toolchains, nothing else. the big issue with those is that OS dependencies are a pain to define / determine, and regardless of the ABI, if the OS version is too old, you are screwed. i do not see it as an issue to rebuild those with a toolchain, even if they are maybe not scrictly speaking required. the fact that you don't have to care about static/dynamic linking etc etc is more then enough reason to just rebuild them. after all, it's just a few extra modules and some extra storage, in return you have worry free tools to offer to your users.

fgeorgatos commented 11 years ago

+1 to latest stdweird's comment: I favor the "worry-free" "debugless" approach -in relation to persons' manhours-, full rebuild imposes only a slight higher expense of a system's time/space. (and with future tools like lmod it makes plenty of sense to do it anyhow)

fgeorgatos commented 11 years ago

what is valuable to keep promoting as part of this issue, is the notion that we don't really bother with providing, say, a2ps for 4 different toolchains; although I'm the guy creating all those, I'd be the first to admit that it is quite overkill.

We may still have reasons to provide the different builds, but we should consolidate the easyconfigs, at least.

fgeorgatos commented 10 years ago

this is just to confirm that there is good merit in this issue, and I expect it to pop up during the next hackathon, if debuggers & some performance tools are to be discussed. Namely, members of this bundle may be done otherwise: https://github.com/hpcugent/easybuild-easyconfigs/tree/master/easybuild/easyconfigs/h/HPCBIOS_Debuggers

the reproducibility argument is still there, which could be something like a hash in a versionsuffix!

Having said that, I am not very convinced for the purpose of commit 05deccc (ie. introduce toolchains for Debuggers and Profilers), since only the tools that fiddle with the MPI stack in itself (fi. Scalasca) should be having dynamic libraries dependencies over toolchains etc. I've never been convinced, really ;-)

As a proof of this, notice that DDT and TotalView are statically linked (and probably that is the true one objective).

geimer commented 10 years ago

I really support the idea of providing at least the option to create toolchain-neutral software. There are certainly good reasons for going one or the other route, but in the end it should be the admin's choice...

Bart-Ver commented 10 years ago

On 20/02/14 23:27, geimer wrote:

I really support the idea of providing at least the /option/ to create toolchain-neutral software. There are certainly good reasons for going one or the other route, but in the end it should be the admin's choice...

That is true if there is only one admin. We have more than 5 people installing software. EB is sometimes a little strict, but it is consistent. The more choice me and my colleagues have, to more it will become a mess (again).

Just my humble opinion, Bart

— Reply to this email directly or view it on GitHub https://github.com/hpcugent/easybuild-framework/issues/570#issuecomment-35606722.

Dr. Bart Verleye Centre for e-Research Level G, Room 409-G21 24 SYMONDS ST Auckland 1010 New Zealand +64 (0) 9 923 9740 ext 89740

boegel commented 10 years ago

@Bart-CER: How about using a system-wide EasyBuild configuration file, and agreeing on the policy not to fiddle with the configuration otherwise (e.g. env vars or command line options to override the config file)? EasyBuild should allow people to achieve the things they want, with reasonable defaults. EasyBuild is not a team manager. ;-)

stdweird commented 10 years ago

some more feedback after the julich hacakthon:

in my opinion toolchain neutral software implies that static binaries are produced and no dependencies on anything external. the "no dependencies" can be loosened a bit (eg assume bash is available), but this becomes quickly another (arbitrary) distinction what and what not can be assumed to be present. and, for completeness, the version suffix for toolchain neutral software should probably include the builddepencies (eg Doxygen/0.1-dummy-dummy-GCC-4.8.2, but this becomes a naming issue quickly).

toolchain neutral software however is not "software provided by the OS wrapped by easybuild" (the whole "why can't i use the gcc on my system instead forcing me to compile one from scratch" discussion). we should use proper terminology here (i'd call it system software). EB should provide an easy way to generate EB-compliant modules around existing 3rd party modules or system software, but this is to be avoided at all cost. in particular, before EB generates this module, it should run the sanitychecks. problems with these system software modules are beyond easybuild control, and this should also be made clear to whoever wants to go this way. (and i really hope we can avoid support for the case that the system provides gcc (the c compiler) and not gfortran, and that users want to fake the GCC subtoolchain, or have EB only install gfortran)

and a 3rd remark, easybuild should make an effort to specify minimal (sub)toolchain requirements in the easyconfigs. this effort could be part of moving to the new format2.0 (i don't think it's wise to modify the current format 1.0 files). eg if a software package is truly only dependent on GCC, the toolchain requirements should specifiy this; and easybuild should provide a way to either install it with the subtoolchain or figure out a matching toolchain and use that toolchain (as is done now). in an extreme case, the current toolchain info is dropped and everything becomes a (build)dependency. EB can figure out based on the dependencies what toolchain would have been specified. the naming will become a mess (the main reason why toolchains are used), but this might be solved by hierarchical modules.

boegel commented 10 years ago

One (other) use case of this is PerfExpert (cfr. https://github.com/hpcugent/easybuild-easyconfigs/pull/839).

The installed PerfExpert module should be toolchain-neutral, in the sense that you should be able to use together with any other module, regardless of with which compiler toolchain it was built. In this particular case, you can't get away with a statement like "just build PerfExpert with whatever toolchain was used to build the software package you're analysing". Building PerfExpert with e.g. ictce is a no-go, mostly because of its dependencies (e.g. ROSE even requires GCC 4.4.x or an older GCC 4.x (but not too old)), but locking down the build dependencies (e.g. the compiler) used for building PerfExpert and all deps is important for reproducibility.

Another aspect of this is that ideally, PerfExpert should be provided via a single module (not a module that loads a bunch of other modules as dependencies), to avoid problems with dependencies that are common for other applications (e.g. Boost). Only one application version (even regardless of toolchain) can be loaded at a time via a module, however the linker is able to correctly handle multiple versions of the Boost library to be available at runtime (in $LD_LIBRARY_PATH), so even without static linking 'collapsing' dependencies together has proper use cases...

fgeorgatos commented 10 years ago

Hi Kenneth, all,

fyi. the argumentation you made is exactly of the same type I have been promoting all along last year, as regards debuggers's easyconfigs!

For the same reason, I have never been really convinced about this commit on HPCBIOS_Debuggers: https://github.com/hpcugent/easybuild-easyconfigs/commit/05deccca2076185e98409db7345bd69e31b4a3a7 We may not really want multiple builds of such tools, when exactly one does the job equally well. If you agree with that statement, that commit could be reverted (pending an aligned treat of GDB)!

On Wed, May 7, 2014 at 3:30 PM, Kenneth Hoste notifications@github.comwrote:

One (other) use case of this is PerfExpert (cfr. hpcugent/easybuild-easyconfigs#839https://github.com/hpcugent/easybuild-easyconfigs/pull/839 ).

The installed PerfExpert module should be toolchain-neutral, in the sense that you should be able to use together with any other module, regardless of with which compiler toolchain it was built.

Of course, the reproducibility argument still applies, so nothing wrong with the concept of actually confining the build with specific compiler versions, libraries etc. We have discussed this again before and during the JSC hackathon and see merit: https://github.com/hpcugent/easybuild-framework/issues/570#issuecomment-32176974

To summarize:

There are quite a few of us (@xbesseron, @gribozavr, @geimer, @georgets, @fgeorgatos) interested to see how this PR will evolve, because it has good impact on future work!

citibeth commented 8 years ago

Since every piece of software has to be built with SOME toolchain, I don't understand what marking an .eb as "toolchain-neutral" would mean.

Maybe the right approach is to allow dependencies to allow a wildcard toolchain when specifying dependencies. For example, netCDF has a build dependency on CMake. It doesn't matter which toolchain was used to build CMake, as long as bin/cmake runs. One could then build these basic tools with some kind of "plain vanilla" toolchain.

boegel commented 8 years ago

@citibob: marking something as 'toolchain-neutral' basically means that the end result doesn't depend on the toolchain at runtime in any way

We already support the wildcard you suggest, in the sense that we have support for resolving dependencies with taken subtoolchains into account, cfr. http://easybuild.readthedocs.org/en/latest/Manipulating_dependencies.html#minimal-toolchains.

The toolchain-neutral idea goes a bit further though, in some sense. If something is toolchain-neutral, you only need a single build of it, and then it can be used in a build with any (other) toolchain.

citibeth commented 8 years ago

We already support the wildcard you suggest

An interesting feature, but not quite the same. Suppose I built CMake with Clang, and now I'm compiling netCDF with GCC, which has a build dependency on CMake. As far as I can tell, --minimal-toolchains will not be able to use the Clang version of CMake, since Clang is not an ancestor of the GCC toolchain. The wildcard I'm suggesting would work for ANY toolchain.

If something is toolchain-neutral, you only need a single build of it, and then it can be used in a build with any (other) toolchain.

I think the core issue here is, WHO declares a property of a piece of software: a) If it's a wildcard, then the USER declares that the toolchain used doesn't matter. b) If it's a toolchain-neutral feature, then the PRODUCER declares the toolchain used doesn't matter.

I'm fearful that EasyBuild might be running down the path of an ill-conceived collection of ad-hoc dependency-matching mechanisms. One big problem with --minimal-toolchains is it's an EB config parameter, not a property of an .eb file itself. If its use breaks ANY .eb files (likely), then people will turn it off. Add on three or four such "features," and it becomes hard to tell what will match with what.

I would recommend the following:

  1. Figure out if it is feasible to stop using LD_LIBRARY_PATH and use RPATH instead, as is used with Spack. This would seem to be a prerequisite for any serious mix-n-match between toolchains. (For example, what if our Clang-compiled CMake doesn't work with the LD_LIBRARY_PATH required to compile NetCDF with our toolchain?)
  2. Think through the dependency matching problem carefully and come up with a consistent, systematic way to match dependencies. The system would need to provide:

    a) In specifying a dependency (toolchain, lib, build or otherwise), a standard way to specify what you're asking for. The standard needs to allow to specify optional toolchains, version ranges, version blackouts, wildcards, etc.

    b) In writing an easyblock or easyconfig, a standard way to specify what we THINK might be able to use us as a dependency in the future. CMake might be able to specify "anyone can use us." A Clang-compiled C library might specify "anyone using Clang or GCC can use us." A Fortran library compiled with GCC 4.9.3 would specify "only projects compiled with GCC-4.9.3 can use us."

    c) A way to match specs of (a) to specs of (b) to provide dependency match. For example, MyApp-GCC-4.9.3.eb might specify "I need version 1.5 or greater of MyLib, I don't care what toolchain it's compiled with." From the user side, this can match with just about anything. But this would NOT match with MyLib-1.5-GCC-4.8.1.eb because THAT config specified that it will NOT work with other versions of GCC. This algorithm needs to be simple enough to intuitively understand and debug. We should not be left scratching our heads wondering "why did EasyBuild match to THAT dependency, and how can I convince it to match to the one I really wanted?"

I would suggest someone think through this and propose a system, after reviewing similar dependency-matching systems out there (i.e. the one in Spack). Then we weigh in and get a design everyone can live with. Then we implement it.

On Sun, Jan 3, 2016 at 11:22 AM, Kenneth Hoste notifications@github.com wrote:

@citibob https://github.com/citibob: marking something as 'toolchain-neutral' basically means that the end result doesn't depend on the toolchain at runtime in any way

We already support the wildcard you suggest, in the sense that we have support for resolving dependencies with taken subtoolchains into account, cfr. http://easybuild.readthedocs.org/en/latest/Manipulating_dependencies.html#minimal-toolchains .

The toolchain-neutral idea goes a bit further though, in some sense. If something is toolchain-neutral, you only need a single build of it, and then it can be used in a build with any (other) toolchain.

— Reply to this email directly or view it on GitHub https://github.com/hpcugent/easybuild-framework/issues/570#issuecomment-168515733 .