TriBITSPub / TriBITS

TriBITS: Tribal Build, Integrate, and Test System,
http://tribits.org
Other
36 stars 45 forks source link

Generate Environment / Configuration header #221

Open jjellio opened 7 years ago

jjellio commented 7 years ago

Hi,

A feature I've desired from CMake for a while is the ability to capture the user's input commandline, as well as source code version (version control versioning if available), and environment used to compile the source. I primarily work inside the Trilinos framework, but this feature request is more of a CMake/Tribits issue.

Obtaining the actual commandline options to cmake appears to be very difficult, as once inside a CMake script, you see variables, some that are set from the command line and some that are default CMake variables.

I've concocted a rather crude way to obtain a C++ header with the shell environment + all cmake variables. The solution entails using execute_process plus a bash script to gather the environment. This process currently returns a large string of C++ lines that place these variables + values into a std::map (the format/output is fairly easy to change). Escaping and quoting is a little tricky, but this works.

Second, I loop over all CMake variables, and again format these variables into C++ code that stores the values in a std::map. Again, I have to escape/quote carefully.

Finally, I write the strings obtained to a CMake 'configure' template file, e.g. My_config_source.cpp.in, and then populate the template with the generated source code + appropriate variable names/declarations.

The entire process I follow seems fragile, but it does serve as a prototype. It seems the expertise in Tribits may be able to help.

In the context of Trilinos, what I would like to accomplish is providing this information to the Teuchos package. The commandline parser could have default flag (like help) that dumps the cmake/env information. I work primarily with apps, and on most machines static linking is used. With the above functionality, I would be able to take any binary I have, and be able to extract a wealth of information about the libraries/settings used in its creation. E.g., the BLAS libraries and PATHs, compiler path + version, basically anything CMake can provide. I would also be able to determine the exact Trilinos version used. Since I often build off the git Develop branch, the Trilinos_Major/Minor version is not informative.

Another question is how to insert this into a Tribits project. That is, how to trigger the generation of this script after all packages have been processed. I see Tribits commands that can be called at the end of a package, but is there a hook that can be called at the end of a project's configuration process?

This request/discussion is not Trilinos or C++ specific. Being able to store/retrieve the environment + configuration parameters is extremely useful. Ideally, you want this information baked into the library (and then into the apps if statically linked), not in text header files that can manipulated, which could result in information in the header not being consistent with the library file (.so/.a). A reasonable example of this is Petsc's error handler. If something bad happens, the output from their signal handler prints the exact ./configure arguments used as well as the petsc version information. They also retain information about TPLs in a header. For example, the vendor/version of BLAS/LAPACK library used.

bartlettroscoe commented 7 years ago

I just saw this (I am have not been getting all of my GitHub email notifications for a while).

@jjellio, we should likely discuss this for a few minutes over the phone or Skype. There is a lot here.

I think some of this info might be in the produced <Package>Config.cmake files. The purpose of these make it easy for downstream customers to build and link against upstream (static or dynamic) libraries. Your application code could simply read the generated <Package>Config.cmake files and bake that into your executables if you wanted. Again, let's talk (and sorry for not seeing this sooner).

jjellio commented 7 years ago

@bartlettroscoe I've been out of town and am just now seeing this.

Sure, we can chat if you want.

The TLDR version, is that it would be beneficial to developers and users if Trilinos would bake it's build and configure settings into the actual built binaries/libraries.

Imagine being able to ask a user to simply run their binary with --trilinos-dump-config, or to have Teuchos_*_Exception() append this information in a summarized form. This would also allow a user to recreate a CMakeCache.txt file from any installed Trilinos build (I think that would be extremely useful)

My motivation is that when trying to track performance, the user's environment (bash variables), CMake variables, and TPLs govern performance. This is annoyingly true on Cray machines.

bartlettroscoe commented 7 years ago

@jjellio, how urgent is this? Could we wait to discuss this in detail until FY18 begins (just a little over 2 weeks)?

What you are asking for is more that would likely be provided in a <Package>Config.cmake file (just due to the massive size). The issue of how to provide for reproducability is a challenging task. To be fully reproducible on a given machine, you need to capture the full env (i.e. set > env.log), the input arguments to CMake (which CMake amazingly does not log like autotools created configure scripts do), and the exact version of the source code that CMake is building. And you need to assume that the OS and any of the programs or upstream libraries used on that machine have not changed since that software package was configured, built, and installed. We might need Kitware's help for some of this (like logging the commandline arguments for CMake) but such as system could be created.

It might be good to step back and examine the underlying motivations and requirements more to see if this is best solution for the target customer. There may be easier ways to provide what is needed for that customer. But with that said, I understand the general desire for this type of feature.

jjellio commented 7 years ago

Yes, I achieved roughly what you outlined. Effectively, you need to grab bash's 'env', and you need to grab CMake's variables.

The feature is really something that CMake should provide, e.g., you cannot iterate over the CMake special variable $ENV{}

The logging on command line arguments is also something that is in CMake's court. It seems virtually impossible to achieve this.

From a performance tracking standpoint, knowing which TPLs are used (BLAS/Lapack) is crucial. The real crux, is that you don't know what information you may need, so I have a tendancy to say track the entire shell environment + Cmake variables.

I did have to do some shenanigans with escaping, but the basic functions are: 1) Cmake variables to a list (already built in) 2) shell environment to a list (not built in, and fragile - I used execute_process + bash) 3) Cmake list to C++ data structure (doesn't have to be C++). 4) Generate header / cpp with the above data (I used Cmake's file creation via substitution routines) 5) Integration into a Tribits project - this feature needs to run after CMake has processed the TPLs

It's a hassle, but I suspect it would enhance nightly testing, performance tracking, and enable a better user/dev experience when bugs happen.

When I wrote the Cmake list to C++ stuff, I use a std::map, and preserve the CMake names as the keys. To avoid clashes with the shell environment, I simply create two maps. You can then provide a header file that provides access to the containers and that's it. When those functions are called, if the container has not been initialized, then it calls the initialize routines, which is a huge blob of code that CMake generates to assign values to the container.

We should talk, but I think much of this request is really a CMake request. Tribits may be able to provide the functionality.

bartlettroscoe commented 7 years ago

We should talk, but I think much of this request is really a CMake request. Tribits may be able to provide the functionality.

Okay, let's try and talk in a few weeks. I will be in ABQ the week of the TUG. We could talk then. And we can scope out the additions needed to CMake to better support this (logging the commandline to CMake would be a big help for starters).

bartlettroscoe commented 5 years ago

@jjellio, is this still a valid issue? Given the progress we have made on ATDM Trilinos testing and integration, I don't think reproducability is a problem at this point for Trilinos.

Can we close this? Just trying to clean up TriBITS issues.

jjellio commented 5 years ago

It it still functionality that is not capable w/vanilla CMake to my knowledge. The intent was to be able to embed the build-time parameters into a Trilinos install. Autotools did most of this by default (e.g., config.log). Feel free to close though.