conda-forge / conda-forge.github.io

The conda-forge website.
https://conda-forge.org
BSD 3-Clause "New" or "Revised" License
131 stars 274 forks source link

Should we use gcc from the default channel for Linux (and maybe OS X)? #29

Closed ocefpaf closed 8 years ago

ocefpaf commented 8 years ago

Most of the time we are OK using the compilers installed in the CIs because we all have similar build tools pre-installed in our machines. However, every now and then someone tries to use the packages in a docker image without those tools. (For example https://github.com/ioos/conda-recipes/issues/723 and https://github.com/ioos/conda-recipes/issues/700.) A few questions:

ChrisBarker-NOAA commented 8 years ago

I have no idea about most of this, but:

We should build with the same toolchain as Anaconda is built with as much as possible. Which is clang on OS-X, I think.

And the "manylnux" folks have been working on a Docker image for building manylinux wheels, which is derived from the Anaconda experience -- so that might be a good place to go for Linux:

https://github.com/pypa/manylinux

pelson commented 8 years ago

I'm pretty happy with the reach of our existing binaries. @ocefpaf - I know there is no time like the present to get this right, but I don't really have any experience of it going wrong. My hunch therefore would be to stick with what we have until we find a problem with it. :+1: / :-1:?

PythonCHB commented 8 years ago

👍

ocefpaf commented 8 years ago

I'm pretty happy with the reach of our existing binaries. @ocefpaf - I know there is no time like the present to get this right, but I don't really have any experience of it going wrong.

Well, it is not a matter of right and wrong. I am pretty happy too. The issue arises when people use the conda package in minimalistic docker images.

My hunch therefore would be to stick with what we have until we find a problem with it. / ?

+1 let's just document that people should install _buildessentials and etc.

pelson commented 8 years ago

let's just document that people should install _buildessentials and

The issue arises when people use the conda package in minimalistic docker images.

Ah OK. I've not seen these. I'm happy to tighten that requirement down somewhat - it sounds like quite a big ask to install build_essentials...

ocefpaf commented 8 years ago

My bad, I am on the phone and the # refs above should point to the ioos conda recipe repo.

ocefpaf commented 8 years ago

Ah OK. I've not seen these. I'm happy to tighten that requirement down somewhat - it sounds like quite a big ask to install build_essentials...

build_essentials was a lazy solution from my part. Some cases need only libgomp, others libgfortran.

jakirkham commented 8 years ago

In the few cases where I have had issues elsewhere, I find I can use install_name_tool or patchelf to link things to something like libgcc from conda to resolve these sorts of issue. A little inelegant I suppose, but I do like using the system compilers if I can.

jakirkham commented 8 years ago

So, this ( https://github.com/conda-forge/staged-recipes/pull/164 ) might be such a case where we would want to use conda's gcc.

msarahan commented 8 years ago

Here's what I understand:

If you ship libgcc (more importantly, libstdc++, which comes with it) and shadow the system libstdc++, and the system libstdc++ is newer than the one you ship, you'll run into unresolved symbol errors at runtime and crash or fail to run.

This has been a huge motivator for me to get GCC 5.2 running in our docker build image.

I have argued very strongly internally against using the gcc that is in defaults. My main argument against even having this package is that people will use it on unknown platforms - and this means their packages will have an unknown version dependency on GLibC.

IMHO, Continuum should just ship all the runtimes, the same way we do with Windows. They are much more nicely backwards/forwards compatible on Linux, but I don't see harm in keeping them controlled on Linux.

jjhelmus commented 8 years ago

I think an argument can also be made that you should ship no gcc, libstdc++, and similar runtimes and instead always depend on the system provided ones. This seems to be what the manylinux folks are doing with wheel files. I'm not sure which option is better but I think both should be on the table.

jakirkham commented 8 years ago

One of the other ideas, I was playing with in that PR is bundling only a few essential components like libgfortran or libgomp from the VMs we building in. Things that may not already be included on the system, but we are (or will be) linking against. I am just worried they will get crushed when someone installs defaults' libgcc package and am unclear on if (when) that leads to bad behavior. Also, I know a little less about how these fringe components interact with libgcc, which they are all linked to.

jakirkham commented 8 years ago

Alternatively static linkage remains a valid option here.

jjhelmus commented 8 years ago

I have run into issues where a Fortran compiled extension linked against symbols in my system provided libgfortran that were not in the Anaconda provided one which caused the extension to fail to import. Using conda uninstall -f libgfortran fixed the issue but it is not ideal.

If runtimes are shipped on Linux it seems they must be the most up-to-date versions. Keeping these up to date may require significant maintenance.

jakirkham commented 8 years ago

Yeah, I am liking the static option more and more.

msarahan commented 8 years ago

I'm not clear on how the Manylinux stuff works to depend on libstdc++ on the system. I'm sure they have something figured out, but I just don't understand it.

This is the article that convinced me to pursue the approach I'm behind: http://www.crankuptheamps.com/blog/posts/2014/03/04/Break-The-Chains-of-Version-Dependency/

Note that this is the same approach taken by the Julia team.

msarahan commented 8 years ago

Found it. They place tight restrictions on ABI version:

Therefore, as a consequence of requirement (b), any wheel that depends on versioned symbols from the above shared libraries may depend only on symbols with the following versions:

GLIBC <= 2.5
CXXABI <= 3.4.8
GLIBCXX <= 3.4.9
GCC <= 4.2.0
jjhelmus commented 8 years ago

Yup, from my understanding they are defining a base linux system that has a set of "core" libraries which they expect to 1) exist and 2) match a minimum version. But pip does not have a effective method for providing more up-to-date runtimes like conda does.

jjhelmus commented 8 years ago

I'm warming more to the idea of providing the latest runtimes. Would this allow us to compile package with the GCC 5 libstdc++ ABI and run them on systems using the GCC 4 API?

msarahan commented 8 years ago

I feel like Conda has a better approach here, making the assumption that we should provide it. People can conceivably pip install something without having libstdc++ installed, and end up confused. My wife had that happen with Steam on her Linux computer, for example. Good times. I never thought work would be so useful at home.

FWIW, I'm pretty sure Continuum is taking this route, and you can be certain that it will be maintained as long as we're pushing it, because we'll have customers screaming otherwise.

@jjhelmus yes. Here's my understanding with GCC5:

Compiled with GCC5, CXXFLAGS="${CXXFLAGS} -D_GLIBCXX_USE_CXX11_ABI=0" => GCC4 compatible, runs fine with libstdc++ from gcc 5 (it is dual-abi). Does not link with libs compiled with GCC5 (abi 5)

Compiled with GCC5, CXXFLAGS="${CXXFLAGS} -D_GLIBCXX_USE_CXX11_ABI=1" => GCC5 compatible, runs fine with libstdc++ from gcc 5 (it is dual-abi). Does not link with libs compiled with GCC4.

Continuum is planning on the former setting for now, with a planned switch at some point in the future, along with an associated rebuild of (maybe) everything.

I have tried to make that ABI info readily accessible with startup scripts in the build docker image: https://github.com/ContinuumIO/docker-images/pull/20/files#diff-8320ce46adf2819c0900060bd6c14c43R16

(also see the start_c++??.sh scripts, which are meant to be simple front-ends)

jakirkham commented 8 years ago

Continuum is planning on the former setting for now...

Alright, this clarifies the Linux stuff for me.

...with a planned switch at some point in the future, along with an associated rebuild of (maybe) everything.

That's going to be fun. Hopefully, conda-forge has everything and is super fast then. :smile:

I have tried to make that ABI info readily accessible with startup scripts in the build docker image...

Thanks. This is really useful.

jjhelmus commented 8 years ago

I'm on board too. Thanks for the great explanation @msarahan. It took a bit but I'm seeing the light. Of course now I'm going to have to build GCC 5 tonight.

Sorry for the long tangent on this PR @jakirkham, did this answer your original concern?

jakirkham commented 8 years ago

This is what I am still unclear about, are we shipping gcc on Mac too?

msarahan commented 8 years ago

My current opinion is yes. I'd like to avoid it if possible, but I see the need for OpenMP and Fortran. I'll keep you all in the loop on any discussions we have here.

jakirkham commented 8 years ago

Ok. With OpenMP, maybe we can get around it by doing something similar to the Linux strategy namely building the newest clang on our oldest Mac (10.7). Though Fortran remains a different problem.

msarahan commented 8 years ago

Thanks for being receptive, both of you! Now let's go rule the world! (or maybe just build great software)

jakirkham commented 8 years ago

Thanks for keeping us in the loop.

Now let's go rule the world! (or maybe just build great software)

Are they mutually exclusive? :smiling_imp:

jakirkham commented 8 years ago

So, we might be able to pursue a similar approach as with gcc on Linux, but using clang on Mac. As Apple makes clangs source available. For instance, here is the most recent version of clang for Mac. Based on the llvm version (3.7) in the code, this should support OpenMP. If we build this on 10.7 and/or fix the min framework to 10.7, maybe we could use this to build code that needs OpenMP. As it is the Mac system compiler, it should still support all the special arguments that the actual system compiler would. By using this, it would keep us free from the gcc mess.

Unfortunately, fortran always needs to go through gfortran or some other compiler. I don't believe there is any actively maintained clang frontend for fortran that is stable. DragonEgg was the closest thing, but that has been unmaintained for ~1.5 yrs. However, it might be ok to partition fortran stuff into a special box using gcc. Perhaps we could even use a version from gcc 4.2 so it remains compatible with the old Mac gcc compiler. This way we only would ship libgfortran.dylib with things and not need to ship the other gcc libraries.

jakirkham commented 8 years ago

Another interim/partial solution for the Linux compiler problems (mainly missing C++11 support) might be just to add another CentOS repo for 6.x that has an acceptable version of gcc for us.

My past experience doing this has been pretty good. I don't find myself needing to distribute any libgcc package ever; even though, I dynamically link to the compiler's libraries. This includes situations where the system compiler and libraries are older. I have installed conda packages built this way in very minimal Docker containers and not run into any issues using them. Also, have installed these on most recent Linux systems and they have worked quite well. There is no need to worry about how the compiler is built as it is designed to integrate smoothy into the existing OS. Plus, the installation is fast. So, it will easily be included in any existing Docker Hub build that we are doing.

According to current information, CentOS provides a copy of gcc 4.9.1 as can be seen in their listing. This will, of course, require adding the appropriate SCL (devtoolset-3). This is not 5.x, but it is pretty new and will have full C++11 support and some C++14 support. Thus it will be easily sufficient in cases where programs expect C++0x support, which seems to be an increasingly common case. Though it appears SCL (devtoolset-4) does have gcc 5.2.1, which supports all of C++11 and C++14. So, it is possible to use that, as well.

In short, I think installing a newer gcc from another CentOS 6 repo would be a step in the right direction (even if it is not the final step). It would provide the functionality that we need from the compiler without otherwise hindering our packaging process. Thoughts?

jakirkham commented 8 years ago

Just for fun, I put a very simple mock up of this in a Docker container that is hosted on Docker Hub. It took ~5min to build and push. It has a link back to the code (again very simple). I added the gcc 5.2.1 compiler from devtoolset-4 and set it up so that at container startup one immediately has access to this.

To take it for a spin, I tried building a simple program that used C++11 features (shown below) with this compiler.

// hello.cpp

#include <iostream>
#include <functional>

int main()
{
    std::function<void()> f = [](){ std::cout << "Hello World" << std::endl; };
    f();
    return(0);
}

If I build this program ( g++ --std=c++14 hello.cpp ), the a.out file can be easily run (just prints Hello World), but it can also have its linkages inspected as shown below. Note these are all pointing to standard system libraries. Also, note that the libraries associated with our compiler live in this path ( /opt/rh/devtoolset-4/root/usr/lib/gcc/x86_64-redhat-linux/5.2.1/ ).

$ ldd a.out 
    linux-vdso.so.1 =>  (0x00007fff731e8000)
    libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f8313358000)
    libm.so.6 => /lib64/libm.so.6 (0x00007f83130d4000)
    libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f8312ebd000)
    libc.so.6 => /lib64/libc.so.6 (0x00007f8312b29000)
    /lib64/ld-linux-x86-64.so.2 (0x00005570af8f0000)

If I get really creative, I can export the a.out file (using Docker's mounting features) and load it into a vanilla CentOS 6.6 Docker container. The program still runs just the same. No errors.

What is the take away message here? We can use the gcc provided with devtoolset with any version we want. All compiled programs will be linked in a way where it uses standard system libraries so there is nothing for us to ship (from the compiler). As it is CentOS 6, these system libraries will be really old and probably older than any other Linux distro this program will be installed on. Thus we need not worry about portability in the same way we did with the gcc package yet we retain the benefits.

If there is something I am missing or some problem you see here, please feel free to share. I am sure I can learn more about this too. :smile:

jakirkham commented 8 years ago

This is a bit of tangent about OpenMP and Fortran that doesn't really pertain to the stuff above. Though, I did try similar things with OpenMP and Fortran, which resulted in binaries linked against the system versions of these libraries. However, as many systems do not have OpenMP and Fortran libraries pre-installed, we can't ship something that will work out of the box. Still this situation is no worse than any of the other compiler solutions available. In other words, how can we let a user install packages built with Fortran and/or OpenMP support without an expectation that they have these?

The case of libgfortran is pretty straightforward if we statically link to it. This avoids the user having to install our libgfortran, which anyone who has been following issues on conda and conda build is aware causes countless problems. All we need to do is set a single flag. It is just a matter of getting the make build to recognize this option.

As for OpenMP, the situation is unfortunately not as clear cut. It is not recommend to statically link with OpenMP using gcc. Some have found distributing packages built with OpenMP so problematic that they have dropped using OpenMP altogether (though they mention Windows MOSEK is cross platform and I am sure they have seen this issue on Linux). Where possible, taking this strategy should work. Namely prefer using POSIX threads to OpenMP. This way we can guarantee the needed libraries will be available. However, if that is not an option and OpenMP support is the only threading option, I am not sure what the right answer here is. If we tried to statically link against libgomp (GNU OpenMP) we would also have to statically link against libpthread (POSIX threads). Statically linking to libpthread just sounds like a bad idea to me and I am not alone. So, maybe we could distribute the system libgomp. It will still be tied to an old version glibc so should work on older systems. I suppose this could apply for libgfortran if we really need to, but I don't think it is necessary in that case.

Honestly, I'm not really sure the right way forward with OpenMP. If we did packaging it, I would want to make sure we don't trample on the user's system. In other words, we should check to see if OpenMP support is already present. If it is, we simply don't install our packaged version of OpenMP. Only if there was no copy of OpenMP already, should we try to install ours.

Does anyone have better ideas on the OpenMP problem? Thoughts on the Fortran problem?

msarahan commented 8 years ago

@jakirkham this is good information - especially about GCC 5.2 in devtoolset 4 - but why are you doing it? I don't really see this as an intermediate, more as something completely different.

I think we were already shipping libgfortran and libgomp in the libgcc package (but those need to be split out). Is that not a good idea? Should we reconsider? I think you'll still end up needing to ship libstdc++.so - your example is a good test, but I think it's likely that gcc was able to emit all ABI 4-compatible code here. I don't think that is always the case. Compile with -wabi=2 to have it warn you about incompatible stuff.

The troubles that people had with conda and libgfortran was some sloppy metadata. We updated scipy to build with GCC 4.4 recently because 4.1 was causing issues. The version spec for libgfortran was blank - so users were not necessarily getting the latest version. To make things worse, since both libgcc and libgfortran packages both include libgfortran, there was an ambiguity in which one might clobber the other (primarily just the symlink, since the so versions were different.) By both specifying libgfortran specifically, and by applying the appropriate version specifier, I don't think there would be issues.

Let's talk about this in tomorrow's call. I appreciate that we need to do something about a C++11 compiler, and I want to make sure that we're not spending effort now that will end up wasted.

jakirkham commented 8 years ago

Thanks @msarahan for giving this some thought.

this is good information - especially about GCC 5.2 in devtoolset 4 - but why are you doing it? I don't really see this as an intermediate, more as something completely different.

Not sure which question to answer here. :smile:

I thought of this as a potential intermediate solution for the following reasons:

However, you are right in that this could be a complete and final alternative. Here are some reasons we might want to stick with it.

I think we were already shipping libgfortran and libgomp in the libgcc package (but those need to be split out). Is that not a good idea? Should we reconsider?

On the libgfortran side, I really think static linking is simple and effective. Here is an example where some package expected to use the system libgfortran, but got really messed up. These are my thoughts with regards to linking libgfortran.

In short, I think statically linking libgfortran is just less risky and the cost paid is so minimal that I am just willing to accept it. Plus, it frees up developer time from this otherwise hairy issue.

As for libgomp, I think we have to package it (static linking is not an option). We could expect users to have it, but that really goes against the spirit of conda. We should just proceed with caution.

I think you'll still end up needing to ship libstdc++.so - your example is a good test, but I think it's likely that gcc was able to emit all ABI 4-compatible code here. I don't think that is always the case. Compile with -wabi=2 to have it warn you about incompatible stuff.

So admittedly this is a very trivial example (it might be a painful demo if it wasn't :wink:), but I have used this to compile things like VIGRA, Boost, and a variety of other complex C++ code with C++11 support and use it out of the box on machines that don't have this support without shipping any libraries from the newer compiler. My understanding of the devtoolset is to provide the functionality of a newer gcc, but still be able to deploy on the same OS with the standard system libraries. This seems to be a convenience for developers trying to deploy code to a cluster with an old OS. My understanding is it does this by doing some static linking (hence why we see no linkages to libraries shipped with the devtoolset compiler). RedHat provides some good info on how devtoolsets work, where the binaries they build are expected to work, and how C++ compatibility works. If we really need CentOS 5.x support, we could guarantee that support with an older devtoolset (2.1) that still has C++11 support, but we would need to switch to 5.x too. If we decide to use the devtoolset proposal, we basically need to pick one devtoolset version and stick with it.

The troubles that people had with conda and libgfortran was some sloppy metadata. We updated scipy to build with GCC 4.4 recently because 4.1 was causing issues. The version spec for libgfortran was blank - so users were not necessarily getting the latest version. To make things worse, since both libgcc and libgfortran packages both include libgfortran, there was an ambiguity in which one might clobber the other (primarily just the symlink, since the so versions were different.) By both specifying libgfortran specifically, and by applying the appropriate version specifier, I don't think there would be issues.

Yes, there were these issues. I believe they have been mostly resolved. Though some like the example above remain. Though this raises a really good point that we should think about.

Having to ship libgcc puts us in a situation where we have one package that everything depends on. Any break in it weakens the whole ecosystem.

I think you are right to say we should move away from the gcc package. It had lots of weaknesses and took a lot of magic to work correctly. Some of it is a bit hand wavy unfortunately. For the most part it works, but when it doesn’t (except for a few common cases) it is almost impossible to tell what went wrong. Maybe moving away from libgcc mostly (if not entirely) is a good idea too. I don’t think any of us enjoys having to deal with these problems. Let’s find a way of avoiding letting them happen at all. Perhaps it is not possible, but if it is our lives would be that much simpler. :smile:

Let's talk about this in tomorrow's call. I appreciate that we need to do something about a C++11 compiler, and I want to make sure that we're not spending effort now that will end up wasted.

Absolutely! Thanks for your feedback on this. Certainly there is still room to discuss and think about all of this. It is a fairly challenging problem and there are many reasonable approaches. I do appreciate the thought and work that everyone has already put into this problem.

jakirkham commented 8 years ago

Just for more food for thought, I added a CentOS 5.11 container to Docker Hub, as well. Was automatically built from the same repo using the centos_5 branch. Took ~7min. Still pretty simple. The latest version of devtoolset for CentOS 5 is 2.1. This provides gcc 4.8.2, which does support C++11. Feel free to play with it if you are curious. Built the sample example from above without issues and worked great on CentOS 5.11 (without devtoolset installed) and CentOS 6.6 (also without devtoolset installed).

jakirkham commented 8 years ago

So, after a nice discussion with @msarahan, @pelson, and @ocefpaf, one of the important concerns that @msarahan raised was the need to be able to build 32-bit and 64-bit binaries. While there are 32-bit RPMs for devtools, they don't seem to want to install by default. Others have had this same issue, as well. After a bit of searching, I found a workaround in a gist, which influenced a re-write of the CentOS 5 docker image. At this point, it can compile 32-bit and 64-bit using devtools-2.1 with gcc 4.8.2. With this I built Fortran, OpenMP, C++ and C++11 code. I pushed it to Docker Hub and it took only 4mins to build. Please try it out and give me your thoughts. The same should be possible with devtools-2.1 on CentOS 6 though I may give this a try for demonstrative purposes.

For further independent verification, I have shared the test code and run this as part of the lastest DockerHub build.

Please let me know your thoughts.

msarahan commented 8 years ago

@jakirkham thanks for continuing to work on this. I am going to spend some time this weekend carefully reading the Red Hat articles you linked, and also testing your test program, as well as seeing if there are any other more stringent tests for us to understand our boundaries.

If possible, I would like to keep GCC 5.2. 4.8 is attractive because it is in line with the manylinux effort, and because it is readily obtainable from the devtoolset, but I see little else going for it. It does not support as many modern optimization options (https://access.redhat.com/documentation/en-US/Red_Hat_Developer_Toolset/4/html/User_Guide/sect-Red_Hat_Developer_Toolset-Features.html), and although it does compile C++11, there must be some reason why it was necessary to have the new ABI. I want to understand what we might be missing by using an older approach at C++11.

My gut says it's a good idea to stick with Red Hat's compiler toolset. They have vastly more experience and a much larger user base. The only flaw here is that we're then at their mercy in terms of being able to update in the future. Decoupling the compiler from the OS version is a very good thing, and I put a lot of weight into that consideration because Continuum has been stuck with very old GCC versions for this reason (and because no one explored the devtoolset route earlier). I think it is worth our while to pick up these skills. Use Red Hat as an example, but let's maintain this capability.

Finally, there's already significant momentum at Continuum using the new image. The Linux build workers for Anaconda.org are using my image (building on top of it). Some groups are starting to use it as well - the ones I know of are DyND and the Anaconda group (myself and @groutr). For me to go back and say "oops, this older way was actually right" - I need to know that the older way is actually right, and I think the reason there needs to be much more than "the custom compiled gcc takes longer to set up." Especially because this all started with me trying to pitch use of the Holy Build Box (https://github.com/phusion/holy-build-box), which is pretty much exactly what you're proposing - and people asked me to pursue a newer compiler. The kinds of faults that I think might force this issue would be if we can or can't find a way to avoid shipping libstdc++ with 5.2 (with whatever partial static linking Red Hat does), or some large flaw in GCC 5.2.

More thoughts soon.

jakirkham commented 8 years ago

So, this is another reason to consider switching to something newer.

msarahan commented 8 years ago

What I've found so far:

echo '/* GNU ld script
   Use the shared library, but some functions are only in
   the static library, so try that secondarily.  */
%{oformat2}
INPUT ( %{?scl:%{_root_prefix}}%{!?scl:%{_prefix}}/lib/libstdc++.so.6 -lstdc++_nonshared )' > 32/libstdc++.so

The nonshared libraries come from stuff they add in their patches. See the gcc5-libstdc++-compat.patch file in that src rpm. This is stuff that is common to all devtoolsets, AFAICT.

We can totally replicate that behavior without using RH's devtoolset. Moreover, we can do so with arbitrary compiler versions, rather than being stuck with whatever devtoolset is current on the lowest possible CentOS we can tolerate. I'm happy to take that on, but I don't want to waste my effort if we're not going to commit to maintaining GCC.

DyND uses C++14, and they have stated their minimum requirement as GCC 4.9 for that reason. If we do not either update to CentOS 6 (which I'd like, but may not be possible due to customer support requirements), or employ the compiled compiler (yo dawg), then DyND will have to use a different toolset from us, and that feels like a major loss here, in terms of defining a standard build platform. They're way ahead of the curve here in terms of employing C++14, but this is also (I hope) a reasonably long-term decision.

jakirkham commented 8 years ago

@mingwandroid recommends not statically linking anything (fortran or otherwise). From his point of view, without a way of knowing what you need to rebuild when you update something, it introduces more work than it saves.

With other things, I think I would agree. With libgfortran (given the problems we have had and still have), I disagree. Our goal is to have a standard compiler in a standard docker image that we always use. This provides clear expectations of what libgfortran we are using to statically link against. So, I don't see this as a problem in that case.

This is an important point: we should be recording information about the build environment into our packages.

Agreed. Though this starts feel like changes need to go into conda-build to make this work correctly.

If it's a docker image, perhaps the tag or the hash.

Agreed.

Though we should really find a way to version the docker images though. It seems we are providing a fixed system and compiler toolset. So, it should be versionable or are there reasons this would work?

Again this will likely require changes to conda-build to work correctly.

We also need to record library versions that we link against (and how - statically/dynamically) - it may help understand why something is going wrong at some point in the future.

Yes, we want this information. Again conda-build changes are likely required to make this work correctly.

I'm about 95% sure that the OS-level compatibility grid that @jakirkham posted on the RH site is essentially just stating GLibC compatibility. This is the sticking point with CentOS 5. If we build on any newer platform, GLibC is not backwards compatible.

That may be true. I tried to find something more explicit, but did not see anything like this when I had looked.

However, we are already using CentOS 6 and so we already have this issue.

I have not found devtoolset-4 available for CentOS.

They put it in a different location. ( http://buildlogs.centos.org/centos/6/sclo/x86_64/rh/devtoolset-4/ )

...but I can't get a CentOS 6 system to install it with yum from any source.

Not sure why not. Definitely was able to get this to build and Docker Hub can confirm. Did you look at my Dockerfile?

Using the files posted on their build system, I have discovered how they do their partial static linking. It's clever....The nonshared libraries come from stuff they add in their patches. See the gcc5-libstdc++-compat.patch file in that src rpm. This is stuff that is common to all devtoolsets, AFAICT.

Nifty. Would that work for forcing static linking of the system libgfortran too?

We can totally replicate that behavior without using RH's devtoolset. Moreover, we can do so with arbitrary compiler versions, rather than being stuck with whatever devtoolset is current on the lowest possible CentOS we can tolerate.

Are we sure that is all we are missing?

I'm happy to take that on, but I don't want to waste my effort if we're not going to commit to maintaining GCC.

Completely understand. This is why I want to sort this out here.

DyND uses C++14, and they have stated their minimum requirement as GCC 4.9 for that reason. If we do not either update to CentOS 6 (which I'd like, but may not be possible due to customer support requirements), or employ the compiled compiler (yo dawg), then DyND will have to use a different toolset from us, and that feels like a major loss here, in terms of defining a standard build platform. They're way ahead of the curve here in terms of employing C++14, but this is also (I hope) a reasonably long-term decision.

Unfortunately, there are are more issues with adopting CentOS 5 than this one. We basically can never have GPU support ( https://github.com/conda-forge/conda-forge.github.io/issues/63 ) AFAICT. NVIDA only provides support for CentOS 6 and 7 not 5. So, having to go back to CentOS 5 (as we are using CentOS 6 now) is a huge problem IMHO.

To be completely clear, without some hard evidence (stats) as to why the switch to CentOS 5 makes sense I am against it. Even with this information, we may still find ourselves in a situation where we have 2 Docker images because of CentOS 5's limitations. Sorry, to be so strong on this point, but I do hope you understand my reasoning.

msarahan commented 8 years ago

@jakirkham I missed your GCC 5 devtoolset image, and saw only your CentOS 5 devtoolset 2 image. Thanks for the example. I wasn't finding it because it only shows up in searches after you do

yum install -y centos-release-scl

I think there are a lot of complicated issues tied up here, and I want to try to untangle them. I see:

NVidia's last support of CentOS5 was Cuda 6.5: https://developer.nvidia.com/cuda-toolkit-65

Also, I'm not going to be happy with any configuration until everything works - whether it's my attempts at CentOS5, or anything more modern. If GPU stuff doesn't work, then the image is not the image we'll go with. Ultimately if Continuum has to bifurcate its build systems to keep supporting older customers, that's what we'll do, but that complicates the package ecosystem and makes neither Continuum nor conda-forge look good, and will cause some headaches no matter what.

Let's approach this from a list of features that we need, then figure out how to meet those needs. For me:

jakirkham commented 8 years ago

Sorry I am responding to these in reverse order and now it will be posted after another comment, but I think it is valuable for the discussion. Will try to address more recent comments after posting this.

@jakirkham thanks for continuing to work on this. I am going to spend some time this weekend carefully reading the Red Hat articles you linked, and also testing your test program, as well as seeing if there are any other more stringent tests for us to understand our boundaries.

Thanks again for looking into this.

Just to reiterate, I have always seen my proposal to be a partial or intermediate solution. That being said, we do need to do something about this problem. The full change proposed is a bit hard to swallow at present. Personally, just getting rid of the Continuum gcc package and maintaining an equivalent amount of support from the system is a good first step. This seems like something we all want. Let's see if we can find a way to get that.

If possible, I would like to keep GCC 5.2.

Just so that we are clear. We already do not have gcc 5.2 support. We have gcc 4.4 support with the option to install the Continuum gcc package, which provides gcc 4.8. So, this support does not exist now anyways.

That being said, I am willing to skip handling this problem for now as this is a partial fix after all. Forcing 5.2 would make a switch to CentOS 5 with devtools 2.1 harder. Even though I don't like that switch, we need to keep that path open. So, not supporting gcc 5.2 in the first iteration is ok to me.

4.8 is attractive because it is in line with the manylinux effort, and because it is readily obtainable from the devtoolset, but I see little else going for it. It does not support as many modern optimization options (https://access.redhat.com/documentation/en-US/Red_Hat_Developer_Toolset/4/html/User_Guide/sect-Red_Hat_Developer_Toolset-Features.html)...

I disagree. It is nice because it has full C++11 support. A sizeable number of packages here rely on either C++0x or outright C++11. Being able to address that alone is huge. Not to mention the Continuum gcc package is 4.8. Keeping consistency with that during the transition is quite nice.

While it would be nice to have full C++14 support, I have yet to see a package proposed here that needs that. True DyND needs this. However, we already can't support C++14 as the Continuum gcc package won't do this either. As stated before, I think constraining this on a first pass is too strong of a constraint.

Some other newer features are always nice, but I think we have already improved the situation drastically if we have 4.8 support in the container. It brings us closer to what we both want even if it is not the full solution and it is a change that is easier to accept.

...although it does compile C++11, there must be some reason why it was necessary to have the new ABI. I want to understand what we might be missing by using an older approach at C++11.

While I would like to understand that more too. I don't find this to be a blocker in keeping a compiler version that is already consistent with one that most packages are built with at present.

My gut says it's a good idea to stick with Red Hat's compiler toolset. They have vastly more experience and a much larger user base.

Agreed.

The only flaw here is that we're then at their mercy in terms of being able to update in the future.

Maybe not so much. Red Hat has been moving to dockerize the whole compiler toolset. I think they understand that there is friction between developers and system administrators that they need to address. While this is probably not the full solution yet and may not be usable by us at this point. We should keep an eye on it and see what we can learn from them.

Decoupling the compiler from the OS version is a very good thing, and I put a lot of weight into that consideration because Continuum has been stuck with very old GCC versions for this reason (and because no one explored the devtoolset route earlier). I think it is worth our while to pick up these skills. Use Red Hat as an example, but let's maintain this capability.

Not sure on this point. The general philosophy thus far has been to use system compilers. That still feels like a sound philosophy. As we have learned from trying to improve the gcc package, we discovered all sorts of specialized OS patches just amongst different versions of Linux. Even after all of this work, I still occasionally have had issues with that compiler and have largely gone back to system compilers where possible. Now certainly part of the problem was that we packaged the compiler at all. That much I think we agree on. It is still not clear to me that maintaining this compiler will not be really painful. This leads me to one of my biggest concerns.

It is still not clear to me that there is a good method for community maintenance of this image. While there does seem to be a concerted effort to keep this open sourced (which I do really appreciated), it still is not practical to build fixes. I would really want to see the following things.

  1. Community members are able to make changes.
  2. A rebuild and push is triggered automatically on merge to update the latest version. (ideally with Docker Hub or some other web-based system)
  3. A transparent versioning scheme.
  4. The source code lives at conda-forge.

I understand these are hard things to get and will require a fair bit of discussion, thought, and effort. This is exactly why I am trying to propose a stopgap that will get us some of what we want now without having to wait for a long time for the ideal fix.

Finally, there's already significant momentum at Continuum using the new image. The Linux build workers for Anaconda.org are using my image (building on top of it). Some groups are starting to use it as well - the ones I know of are DyND and the Anaconda group (myself and @groutr). For me to go back and say "oops, this older way was actually right" - I need to know that the older way is actually right, and I think the reason there needs to be much more than "the custom compiled gcc takes longer to set up."

To be clear, I still don't feel like my proposal is the whole solution. However, the complete solution is still a bit hard to support yet. In other words, I don't think there is an "oops". Though it is perfectly reasonable to put a fair bit of thought into how we proceed. At present, I think a step (even a small and not completely satisfying one) in the right direction is an improvement. We really should embrace that step as it brings us closer (while not all the way) in the direction we need to go.

Especially because this all started with me trying to pitch use of the Holy Build Box (https://github.com/phusion/holy-build-box), which is pretty much exactly what you're proposing - and people asked me to pursue a newer compiler. The kinds of faults that I think might force this issue would be if we can or can't find a way to avoid shipping libstdc++ with 5.2 (with whatever partial static linking Red Hat does), or some large flaw in GCC 5.2.

Hardly. All I am proposing (to restate) is we start using devtoolset-2 in our existing image. The reasons we would want to do this are as follows.

  1. Support C++11 (without new dependencies).
  2. Basically drop the Continuum gcc package.
  3. Basically drop libgcc package.
  4. Provide newer tool chain utilities.
  5. Preserve the existing ABI.
  6. Avoid a complete rebuild for now.

This is a stopgap, of course, I won't deny it. However, it is a simple change that gets us much closer to what we want. It is too hard to make this perfect in one go IMHO. Though if we are willing to take steps in that direction, I believe we can get there.

njsmith commented 8 years ago

although it does compile C++11, there must be some reason why it was necessary to have the new ABI

The two changes in the C++11 spec that I've seen cited as driving the ABI breakage are (1) std::string is no longer allowed to be copy-on-write (so this means some operations are slower and some are faster), (2) std::list::size is now required to be O(1) instead of O(n). So neither affects code correctness, only the complexity of different operations (ref).

njsmith commented 8 years ago

@msarahan: Also, if you do decide to experiment with bumping Anaconda's minimum required version to RH6, then I at least will be extremely interested in what you find :-). This transition is going to happen soon-ish one way or another, since RH5's final EOL is <12 months away now (see also a RH engineer commenting on this here: https://github.com/pypa/manylinux/issues/46#issuecomment-206702714 -- "EL5 overall is now well into its wind-down days and we are working with folks still running it to move off").

(I guess in your position I would also explore the viability of convincing the dynd folks to cut it out with this whole "writing software that can't be distributed" thing...)

jakirkham commented 8 years ago

The two changes in the C++11 spec that I've seen cited as driving the ABI breakage are (1) std::string is no longer allowed to be copy-on-write (so this means some operations are slower and some are faster), (2) std::list::size is now required to be O(1) instead of O(n). So neither affects code correctness, only the complexity of different operations (ref).

Thanks for the info, @njsmith. I remembered hearing about some change to std::string, but was fuzzy on the details.

jakirkham commented 8 years ago

I guess in your position I would also explore the viability of convincing the dynd folks to cut it out with this whole "writing software that can't be distributed" thing...

Just to give you a bit more context, @njsmith, (not knowing what you may or may not have read) the proposal that @msarahan is putting forth builds gcc 5.2.1 in a docker container with CentOS 5.11. This would allow C++14 to be supported and would allow compatibility with an old glibc; however, it comes at the cost of having a docker image that can be easily maintained.

njsmith commented 8 years ago

builds gcc 5.2.1 in a docker container with CentOS 5.11.

Oh, I see -- so you'd still be able to use the system glibc, but give up on using the system libgcc, libstdc++, libgfortran. I guess that works OK if you're willing to ship those and are willing to accept that existing conda environments get broken every time a new GCC release comes out :-/. (I feel like the folks who want to distribute conda manifests as a mechanism for long-term software reproducibility might have opinions on this...)

msarahan commented 8 years ago

Red Hat has been moving to dockerize the whole compiler toolset.

This looks good, but I don't think it's going to solve any fundamental problems. GLibC lives inside any given docker container. That is effectively the exact same argument we're having about choosing a particular CentOS version. Compilers will be tied to their product lifecycle, for better or worse.

Support C++11 (without new dependencies). Basically drop the Continuum gcc package. Basically drop libgcc package. Provide newer tool chain utilities. Preserve the existing ABI. Avoid a complete rebuild for now.

With the exception of dropping libgcc (better called "gcc runtimes"- libstdc++, libgcc, libgomp, libquadmath, libgfortran), these were my explicit goals with the docker image that I have created. I had not seen how Red Hat's partial static linking worked, and had assumed that taking Julia's route would be best. I still think it might be. It works pretty well - the default ABI is compatible with GCC 4. It avoids a complete rebuild.

Can you explain why build time of the docker image is a concern to you? It is presently layered, so that the GCC compilation is a totally separate image with a totally separate docker file. It could just as easily be a conda package or an RPM. I don't see how this is any different from using Red Hat's package - just a question of who builds it.

Regarding your wishlist:

Community members are able to make changes.

I think a key idea here is that not everyone needs to use exactly the same build image. The base of it probably all needs to be the same (compiler, binutils, etc), but what you put on top is totally free. Also, community members are welcome to make PRs on the Continuum docker recipes. You'd be talking to me on those PRs.

A rebuild and push is triggered automatically on merge to update the latest version. (ideally with Docker Hub or some other web-based system)

You can do this on top of the GCC image readily. That's why I split them up. I'd like to do it with GCC, but as you know, build time is prohibitive. I could potentially set up Jenkins on an internal build server.

A transparent versioning scheme.

My versioning is presently (CentOS version)-(GCC version)-(docker image build number) for the GCC base image, and the same, but one additional build number for the conda layer on top. How would you improve on that? I certainly need to document it.

The source code lives at conda-forge.

I perceive this as some distrust towards Continuum. I'm sorry, I'm not sure what I can do here. Continuum must maintain its own build tools as core infrastructure. I would ask that you show the same trust that you do for Conda and conda-build. After all, the source code for devtoolset-4 doesn't live at conda-forge, does it?

it comes at the cost of having a docker image that can be easily maintained.

Please back this up with evidence - use cases at the very least. If the GCC works well (and it does, for me, and for other Continuum folks), then there is no additional maintenance cost for you, relative to Red Hat packages..

msarahan commented 8 years ago

willing to accept that existing conda environments get broken every time a new GCC release comes out :-/

I don't think it's anywhere near that bad. I have found GCC releases to be remarkably backwards compatible. The only recent enormous mess-up I know of in conda was with fortran, and that was because gfortran and libgcc both packaged libgfortran, but were not created from the same source (and thus ended up out of sync). Dumb stuff happened.

The C++11 ABI will be a major break, but this docker image is not intended to make that change (though it can, if anyone wants)

njsmith commented 8 years ago

No, backwards compatibility I'm not worried about, the (potential) brokenness I'm talking about is the issue with GCC runtimes not being forward compatible. As soon as the system libstdc++ and friends are newer than the version in a conda image, bad things start happening, specifically for any users who have private compiled code that ends up linking against the system libstdc++ and then executing against the conda version. And specifically I'm pointing out that this may cause some headaches for people who download the conda image attached to a several-year-old paper and try to reproduce the results :-/

My gut feeling is that if for these runtimes whose version is so tightly tied to the compiler version, the options that make sense are for everyone to use the system runtimes + system compiler (or an older compiler), for everyone to use whatever compiler but encapsulate the runtime library so that this choice is only visible to the particular project being built (e.g. by statically linking it, or by renaming the runtime libraries like auditwheel is doing now), or to ship a specific runtime + ship a specific compiler and tell everyone that they have to use the conda gcc rather than their system gcc, so you can upgrade them in sync.

jakirkham commented 8 years ago

Community members are able to make changes.

I think a key idea here is that not everyone needs to use exactly the same build image. The base of it probably all needs to be the same (compiler, binutils, etc), but what you put on top is totally free. Also, community members are welcome to make PRs on the Continuum docker recipes. You'd be talking to me on those PRs.

...

The source code lives at conda-forge.

I perceive this as some distrust towards Continuum. I'm sorry, I'm not sure what I can do here. Continuum must maintain its own build tools as core infrastructure. I would ask that you show the same trust that you do for Conda and conda-build. After all, the source code for devtoolset-4 doesn't live at conda-forge, does it?

Sorry I won't respond to everything now (need to get some sleep), but I do want to address this before it is badly misinterpreted.

Sorry I made this unclear. My intent was never to indicate mistrust. The first point and the last point are connected in the following way (almost redundantly so). Supposing something horrible happens to the build system and you are on vacation, we need a way to apply a simple hotfix. We may need direct access to the repo to affect this change in a timely manner. I completely agree that nearly all of the time it will be direct interaction with you, which is completely fine with me. It is just that rare instance where there is no one there and we need to do something fast that I am concerned about. Based on our conversation about the Heroku buildpack, it was not clear to me that this would be achievable if it lived at the conda org. If it would be achievable there, then this is completely irrelevant.

I will try to respond to your other points later.