conan-io / conan

Conan - The open-source C and C++ package manager
https://conan.io
MIT License
8.14k stars 970 forks source link

[feature] conan package_id is too aggressive #7664

Closed bog-dan-ro closed 1 year ago

bog-dan-ro commented 4 years ago

Apart from MSVC[0], almost all c/c++ libraries are cross compiler type and version. i.e. zlib can be built with gcc 5, then the bins can be consumed by apps which are using any other gcc version or even another compiler (clang, icc, etc.), because, usually, the library ABI doesn't depend on a compiler type or version (you can even compile zlib with gcc 10 and consume it with gcc 5).

Right now, by default, conan uses not even the compiler type but also the compiler version, to create unique ids, which:

A solution will be to create a custom package_id method, like:

    def package_id(self):
        if self.settings.compiler != "Visual Studio":
            del self.info.settings.compiler

but there is one quite big problem: del self.info.settings.compiler removes too much info, for c++ libs the libcxx and cppstd should still be added to the package_id (e.g. a C++ library compiled with gcc and libstdc++11 can be consumed by clang which uses libstdc++11 but won't work with clang with any other libcxx. Same with cppstd, if the library requires c++14, it will work with 14, 17, 20, 23 but it won't work with 11.

IMHO conan needs a little bit more smarter default package_id, which:

Yes there will be exceptions when a C/C++ library will need to enforce a compiler type or even a compiler version (e.g. the library will use non-compatible compiler/linker flags), but the vast majority doesn't need that. In this cases the recipe should explicitly add these things to the package_id. In case the recipe doesn't add them, it won't be a big problem, as the user can easily rebuild it locally in case the precompiled bins don't work on that system.

[0] Disclaimer: I know very little about MSVC and Windows in general. What I've read is that MSVC it's a quite strange as the libraries depend on:

memsharded commented 4 years ago

Hi @bog-dan-ro

Thanks for your detailed discussion.

We have decided to go with this model as default for several reasons. A important one is not only about the binary compatibility, but also the efficiency, performance and bug fixes that a new compiler version comes with. New compiler versions come with bug fixes, new optimizations...

Lets say a user depends on zlib in their production systems, building everything with gcc 4.9, including transitive dependencies like zlib. Now they are upgrading and building their products with gcc 7. It is expected that they will want to build zlib too with the new compiler version. Or do you think that most users will like to stick with a binary compiled with gcc 4.9? In our experience, when users upgrade compiler version, they also want to build their dependencies with the new compiler version. It makes much easier maintenance and debugging of their systems in production.

There is also the cognitive overhead, this rule would be more difficult to understand by users. Right now it is basically: "every change in the settings produce a different binary". This is easy to track and change. Yes, we could document, "except if it is a C library, in compilers rather than VS", but it is unlikely most users would read this line, and this would lead to violating the principle of least surprise.

generates way too many unnecessary combinations, it even uses minor gcc versions (8.1, ,8.2, 8.3, 8.4) which is insane. i.e. zlib has way too many packages for a single linux arch!

To minimize this, Conan is defining the 8 version, without minor. So it is possible to use that for different minors and having them binary compatible, this is the current default approach. Those minors are only provided in the default settings for users that also want to fully control de minor too (this has been requested by users, they wanted to make sure that they build everything with the same minor)

Also, whenever possible, Conan tries to default to the "safest" approach. There is no formal definition of "this is a C package", so it is impossible at the moment to know that this behavior should be enabled. Yes, it could try to detect file extensions, or do some magic with the binaries, but again, in this experience this is fragile and has always produced lots of issues, bugs and maintenance, and it is not worth. Let's say that Conan introduces a feature that allows users to specify "this is a C package". Next report that will be received will be: "I have a package that contains both a C library and a C++ library", what should I specify? And then the feature will get more and more complicated.

What conan tries to do is to accomodate all possible user modes. If you want to remove just the compiler version, it can also be done:

def package_id(self):
      if self.info.settings.compiler != "Visual Studio":
            del self.info.settings.compiler.version

So overall, I think having one binary per compiler version as default is a good approach, which has worked reasonably well so far. It might not be optimal for some scenarios, but provide a very good and simple to understand default for the majority of users, with the possibility of users that want another behavior to customize it with great level of detail. So even if we are proposing a new default package_id computation in Conan 2.0, that will possibly include things like static/shared libs, or changing the dependencies default_package_id_mode, I don't think we should change this default. In any case, I will ask for feedback about it. Thanks again for your suggestion!

bog-dan-ro commented 4 years ago

[...] New compiler versions come with bug fixes, new optimizations...

True, but new version might come with new bugs as well ;-)

Lets say a user depends on zlib in their production systems, building everything with gcc 4.9, including transitive dependencies like zlib. Now they are upgrading and building their products with gcc 7. It is expected that they will want to build zlib too with the new compiler version. Or do you think that most users will like to stick with a binary compiled with gcc 4.9?

I'll keep using the existing zlib, unless gcc 7 will magically make it run MUCH faster, which, usually, never happens. This is what all linux distros are doing. Debian, ubuntu, redhat, etc. have a couple of compiler available (usually two gcc and a few more clang), but when they add a new compiler, they don't recompile all packages with the new compiler (debian has ~60K, recompiling everything takes days or probably weeks). Usually the packages are recompiled with a newer compiler when there is a new version or it's absolutely necessary.

In our experience, when users upgrade compiler version, they also want to build their dependencies with the new compiler version. It makes much easier maintenance and debugging of their systems in production.

As I said in my first comment, it's so easy for an user to rebuid everything using his own profile `$ conan install my_package/1.0 --build`` it's enough to rebuild it (and it's dependencies). Some libraries are huge (e.g. Qt). It takes hours to build Qt on a mac for mac os and for ios! I don't care if the newer compiler gives me an extra 0.05% speed improvements, what I care is to use my libs as quick as possible.

There is also the cognitive overhead, this rule would be more difficult to understand by users. Right now it is basically: "every change in the settings produce a different binary". This is easy to track and change. Yes, we could document, "except if it is a C library, in compilers rather than VS", but it is unlikely most users would read this line, and this would lead to violating the principle of least surprise.

TBH I was very surprised to lean that conan uses compiler type and version to compute the package_id on linux by default :). We have to weight the benefits of having a small number of packages (will be better for your CI and also for your conan-center server), also will be better for users which are using slightly different profiles than the ones that you used to build those packages (again, I don't want to wait a few hours to build qt on my mac just because I updated my compiler :) ).

generates way too many unnecessary combinations, it even uses minor gcc versions (8.1, ,8.2, 8.3, 8.4) which is insane. i.e. zlib has way too many packages for a single linux arch!

To minimize this, Conan is defining the 8 version, without minor. So it is possible to use that for different minors and having them binary compatible, this is the current default approach. Those minors are only provided in the default settings for users that also want to fully control de minor too (this has been requested by users, they wanted to make sure that they build everything with the same minor)

Right, but most of the users out there are using gcc 8.X, which means that your gcc 8 precompile packages are useless for all these users.

Also, whenever possible, Conan tries to default to the "safest" approach. There is no formal definition of "this is a C package", so it is impossible at the moment to know that this behavior should be enabled. Yes, it could try to detect file extensions, or do some magic with the binaries, but again, in this experience this is fragile and has always produced lots of issues, bugs and maintenance, and it is not worth. Let's say that Conan introduces a feature that allows users to specify "this is a C package". Next report that will be received will be: "I have a package that contains both a C library and a C++ library", what should I specify? And then the feature will get more and more complicated.

Good point! I think the easiest way it to add a new attribute which will clarify this situation. Now if a library uses C and C+++ code, then C++ rules are enough as they inherit the C ones (the cstd and the libc versions applies to any C++ library as well). But to be safer you can make that attribute an array which can accept several rules for package_id and conan can easily merge them.

What conan tries to do is to accomodate all possible user modes. If you want to remove just the compiler version, it can also be done:

def package_id(self):
      if self.info.settings.compiler != "Visual Studio":
            del self.info.settings.compiler.version

As I pointed above, nuking just the version is not enough, as I want to be cross compiler type as well. I want to switch between gcc, clang and icc without recompiling the everything again, which with conan 1.xx seems impossible to do it.

So overall, I think having one binary per compiler version as default is a good approach, which has worked reasonably well so far. It might not be optimal for some scenarios, but provide a very good and simple to understand default for the majority of users, with the possibility of users that want another behavior to customize it with great level of detail. So even if we are proposing a new default package_id computation in Conan 2.0, that will possibly include things like static/shared libs, or changing the dependencies default_package_id_mode, I don't think we should change this default. In any case, I will ask for feedback about it. Thanks again for your suggestion!

AFAIK things like static/shared, release/debug, fpic, fpie, etc. are part of the options section. Also here some don't make sense to enforce a new package_id (i.e. all compilers but MSVC can use release build libs to build a debug application, of course also here the user can easily rebuild everything for debug).

The main reason why I think the default package_id behavior should be changed it that an user with gcc 8.4 can't use any of your prebuild packages. On the other hand if an user want to rebuild everything (with the new default package_id), he can do it so easy (you can even add a new property to profile which forces all remote packages to be built with this profile no matter what). IMHO the possibility to use precompiled packages is the main reason why people are choosing conan over vcpkg.

This proposal is for conan 2.0 as for conan 1.xx might break the things ;-).

markovandooren commented 4 years ago

For MSVC, Conan is not even strict enough. According to the Microsoft documentation, https://docs.microsoft.com/en-us/cpp/porting/binary-compat-2015-2017?view=vs-2019, the full toolchain version and redistributables must match exactly for binary compatibility.

I experienced this when my local full toolchain version was just a bit behind the version on a build server. Both were Visual Studio 2019. My linker gave an error when when linking in a .lib from a package from the build server because it cannot know if a future Visual Studio version is compatible. To solve the problem I introduced an automatically calculated ToolSetVersion configuration parameter that looks at the local tool chain version, and propagates that to all packages such that any consumed packages are guaranteed to be compatible with the package that is being built.

For local development, our conanfile adds compatible toolset versions from a config file such that developer can be ahead of the build server with Visual Studio upgrades, counting on a linker error to notify any problems. The build server itself, however, always demands an exact match to ensure that the production software cannot run into any compatibility problems.

The full toolset chain version is read from the file VC\Auxiliary\Build\Microsoft.VCToolsVersion.default.props file in the Visual Studio installation.

bog-dan-ro commented 4 years ago

@markovandooren as I said from the begging apart from MSVC. For MSVC, conan will continue to be strict or even stricter if necessary. But I don't think it's fair to punish the rest of the users just because Microsoft can't make their compilers/libraries compatible between versions.

markovandooren commented 4 years ago

I just remarked that it should become stricter for MSVC. Nowhere did I mention anything about how GCC or any other compiler/toolchain should be handled.

dheater commented 3 years ago

I want to switch between gcc, clang and icc without recompiling the everything again, which with conan 1.xx seems impossible to do it.

I would advise against going so far as switching compilers since there are lots of compiler specific #pragma and #ifdef __GCC__ in existing codebases.

bog-dan-ro commented 3 years ago

I want to switch between gcc, clang and icc without recompiling the everything again, which with conan 1.xx seems impossible to do it.

I would advise against going so far as switching compilers since there are lots of compiler specific #pragma and #ifdef __GCC__ in existing codebases.

As long as none of these things are not in headers, we are just fine. clang can use with no problems any gcc on any unix system.

memsharded commented 3 years ago

What is clear from this thread is that a perfect binary model that satisfies all users for all compilers and all needs is completely impossible. Conan offers some good starting point that in practice works relatively well. And it provides the tools to define the binary compatibility model that you want to use, with a high level of accuracy and some useful tools (like compatible_packages) to define things.

As long as none of these things are not in headers, we are just fine.

These are the kind of things that will happen, we could define one model, but there will always be custom cases and different users will want to apply different rules.

Conan is pretty simple in this regard, the package_id logic is actually almost agnostic about compilers, versions, architectures, etc. It does not do a lot of magic, it takes the current settings + options + dependencies values and hash them. It will just pick what is there (following the settings.yml) definition and use that. And we honestly don't want to to down that road of trying to be very smart regarding binary compatibility, because that is a ton of work that will never satisfy all users anyway. If something can be done is to add tools that can be used by users to specify their desired binary compatibility in a simpler way, like we already did with the compatible_packages feature.

FnGyula commented 2 years ago

with a high level of accuracy and some useful tools (like compatible_packages) to define things.

I was trying to use these facilities for the same problem but I've found myself in a chicken and egg problem. To "qualify" a new compiler for a package, I need to create a new package revision. The new package revision will not magically pick up packages from an earlier revision, which means that I needed to build the binaries... but then, if I'm already doing the building, what's I can just use the new compiler, right?

memsharded commented 1 year ago

Conan 2.0 proposes a new model for package_id: