Pivoting on target-cpu/arch

lilith commented 7 years ago

We have a couple common generations of CPU above the baseline x86_64 instruction - namely sandybridge and haswell, with AVX and AVX2/BMI/BMI2 respectively.

LLVM-backed languages and GGC 4.9+ all support "x86-64, sandybridge, haswell, native" for the -march/--target-cpu parameters. GCC 4.8 uses alternate identifiers corei7-avx and core-avx2 for those platforms.

These map nicely to MSVC /arch:AVX and /arch:AVX2, which is as granular as MSVC goes.

For now I'm using an extra field in .conan/settings.yml: target_cpu: [x86, x86-64, nehalem, sandybridge, haswell, native], but I need to move this down into the packages I consume as well, if I want to pivot on sandybridge/haswell support.

Has this come up before? Any convention to adopt?

lasote commented 7 years ago

I have no previous experience managing those alternative architectures, but, why they are not just additional arch setting values?

I'm not sure either about the native setting, from the gcc page it says This selects the CPU to generate code for at compilation time by determining the processor type of the compiling machine It sounds like it should not be a setting value, because it's variable and not determinable.

Any community feedback would be great.

lilith commented 7 years ago

If there is no conditional logic based on arch, and we only support the extra instructions in 64-bit mode, then 'arch' would be fine. But do we want to exclude 32-bit pointer-sized code optimized for these platforms?

lilith commented 7 years ago

These instructions sets are actually available for both x86 and x86_64 architectures.

I think native would need special handling - it should be source-only.

I'm bumping up against this pretty hard with libjpeg-turbo. It's 2-3x faster when compiled for haswell vs baseline x86_64, but recompiling from source pushes Travis over the edge and hits the 45-minute timeout.

lasote commented 6 years ago

I would like to push this for the 0.29, some other user has required the same thing and it's time to establish a convention to follow. Initially only the base settings, later we can think about the build helpers to inject some needed flag and some detection of the CPU microarchitecture (https://pypi.python.org/pypi/cpuid @fpelliccioni) to warn if a bad setting is detected. So @memsharded, lets work on it. @nathanaeljones I think we could:

arch: 
  x86:
       microarch: [None, "nehallem", "bonnell", "sandy_bridge", "ivy_bridge", "silvermont", "haswell", "broadwell", "skylake", "goldmont", "kaby_lake", "coffee_lake"]
  x86_64:
       microarch: [None, "nehallem", "bonnell", "sandy_bridge", "ivy_bridge", "silvermont", "haswell", "broadwell", "skylake", "goldmont", "kaby_lake", "coffee_lake"]
  ppc64le:
  ppc64:
  armv6:
  armv7:
  armv7hf:
  armv8:
  sparc:
  sparcv9:
  mips:
  mips64:
  avr:

I don't like to repeat the microarchitectures, but I don't see a better approach. The "None" allows the user to not specify the subsetting.

lilith commented 6 years ago

I'd also suggest that we may want to support 'native', i.e, whichever features are supported on the build machine. This value would need to disable build caching, though - all packages would need to be built from source.

Also, we may want to consider matching the gcc/llvm names as closely as possible. For GCC < 5 we'll have to map a few anyway, though.

lilith commented 6 years ago

Generations are also not very specific, and may not work on mobile or low-end editions.

I started with x86_64, nehalem, sandybridge, haswell, native, but skylake should probably be added for TSX support.

I'm not sure there's much value in including tick releases unless they add new instruction sets.

lilith commented 6 years ago

For LLVM:

llc -march=x86 -mattr=help llc -march=x86-64 -mattr=help

I forget how to list the values for GCC.

tru commented 6 years ago

For arm we are shipping a couple of different architectures right now:

armv7 hardfloat
armv7 softfloat
armv7 hardfloat + neon
armv7 hardfloat + thumb + neon

I think it makes sense for the armv7 platform to contain: float=["hard", "soft"], thumb=[True, False], neon=[True, False]

On some platforms we also need to set the the specific FPU like this: -mfpu=vfpv3-d16 not sure if that is something that should be abstracted in conan though.

fpelliccioni commented 6 years ago

I was thinking about it, here I leave some of my conclusions:

A. Relying on micro-architecture is a breakthrough, but ... what if I do the following?

g++ xxx.cpp -O3 -march=sandybridge ...
g++ xxx.cpp -O3 -march=haswell ...
g++ xxx.cpp -O3 -march=skylake ...

It is likely that the resulting binary files of 1, 2 and 3 are exactly the same, in such a case, it does not make sense to differentiate them as they are compatible or equal binaries/packages.

B. Some Intel micro-architectures have the same extensions as others. For example, according to the Intel tick-tock model, in theory, Sandybridge and Ivybridge are equivalent (with respect to sets of instructions or extensions). Therefore, it is not worth differentiating them.

I think, in both cases, what really matters is what sets of instructions were used. For this, I am working on a tool that examines an executable or library (.a, .so, .dll, etc ...) and report which sets of instructions were used.

For example:

get_extensions("a.out",...) == ['MODE64', 'SSE', 'AVX']

In this way, the micro-architecture no longer matters, but what really matters are the sets of instructions used.

I think Conan packages can have a setting to determine which extensions the binaries use. For example:

class HelloConan(ConanFile):
    settings = "os", "compiler", "build_type", "arch", "extensions" [0]

extensions have to be assigned after the binary was compiled, I imagine, creating a new method (member function) in the ConanFile class, for example:

def set_extensions(self):
    self.extensions = get_extensions(... list of binary files ...)

On the client side, when Conan looks for a package, it can identify which instructions are available for the processor (using the cpuid python package, for example) and in this way find the package that best fits.

I have a demo of the tool that analyzes the executables, if the idea is of interest/utility to the community, I could invest some time on it. The demo for now only works for x86 and Elf format, but it can be extensible for other architectures and formats (PE, Mach-O).

lilith commented 6 years ago

cat /proc/cpuinfo shows the following flags for me:

fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp

My questions would be

(a) how do we map these to, say, MSVC that only supports AVX and AVX2, and infers other instruction support based on those?

(b) is set math cost-prohibitive for conan?

(c) does the implementation cost of full instruction support outweigh the benefit? I would see nehalem/sandybridge/haswell/skylake/native as quite a bit simpler to implement and test. Permutations make things harder.

(d) don't we still have to specify either an instruction group or processor generation when publishing pre-compiled binaries for other's use?

lilith commented 6 years ago

@fpelliccioni The tool you describe is exactly what I've been looking for to validate my binaries. With so many compilers involved in a build it can be very difficult to ensure that an unsupported instruction didn't sneak in somewhere.

lasote commented 6 years ago

Some comments: @nathanaeljones I'm not sure about the native. I understand your point but a setting should be used to determine the binary you are getting. Any ideas @memsharded? I think native should be avoided in favor of a detection (or user declaration) of a default microarchitecture in the default profile.

@fpelliccioni About:

It is likely that the resulting binary files of 1, 2 and 3 are exactly the same, in such a case, it does not make sense to differentiate them as they are compatible or equal binaries/packages.

Yes, but if you know that your library is built exactly for those different microarchitectures, you can control it in the package_id() method to get only one binary. But conceptually, the code could be different if you build it for different microarchitectures, right?

About the instructions set, I understand it's what really matters but yeah, as you can support many of them it makes unmaintainable and crazy for the user. The combination is quite infinite. So it looks it couldn't be a setting.

About detecting the setting after building the library is kind of chicken-egg problem, now it's working like: you declare the settings => conan builds the library and calculating a package ID. You are proposing the opposite, the settings are determined by the built library. It affects to the core model of conan, and it's not possible to do it, in my opinion.

tru commented 6 years ago

Detection is not possible when cross-compiling either.

lasote commented 6 years ago

Hi all,

Given the previous experience modeling the standard of the language (still WIP), I have some observations, the main question is: Should we model this with options?

In #2042 I proposed a way to have options presets, so (pending review) we could do something similar to this:

class MyLib(ConanFile):
   ...    

   def options(self, config):
       config.add_microarchitecture() # We could add a preset list of march like the described here: https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gcc/x86-Options.html#x86-Options

Most users are not concerned about this, so adding an option only when needed sounds reasonable.
Because with Visual Studio is very different, {IA32, SSE, SSE2, AVX, AVX2}. We could add different options if compiler == "Visual Studio" (options depending on settings)
About the native I still think that is not a good idea because we don't model as a setting the real microprocessor so we can't know in a deterministic way the real configuration for the generated binary.
About the automatic detection with the @fpelliccioni library. I see it as a very interesting next step, as a tool that maybe can be used before the build or package to check the built binaries, but not to autodetect a setting/option.
About the compatibility between them, for example, the instruction set of haswell is compatible with a skylake but I think it doesn't matter. If the user is generating specialized packages for different processors, it's ok to have both of them and let the consumer specify the better for him. But with package_id() method, a recipe creator is not providing a different binary, could model somethink like: "If self.options.march == skylake: self.info.options.marc=haswell or something similar, so it keeps opened to the binary optimizations.

lilith commented 6 years ago

I vote to strike 'native' from this feature request. It's orthogonal.

I would suggest that whichever method is selected, that it is inherited by the tree, such that sub-dependencies are built with the same ISA by default

DavidZemon commented 6 years ago

Been thinking about this topic a lot over the last month as I get deeper into Conan, though this is the first time I've actually read through this issue. And for reference, I'm coming from the "I have to cross-compile for a bunch of different ARM CPUs" world.

I haven't gotten around to implementing this yet, but the best idea I've come up with is to put the burden on the user: create a profile that sets the CFLAGS environment variable with appropriate compiler flags. For instance, I might have a sitara profile that looks like this:

[build_requires]
[settings]
os=Linux
os_build=Linux
arch=armv7hf
arch_build=x86_64
compiler=gcc
compiler.version=4.9
compiler.libcxx=libstdc++
build_type=Release
[options]
[env]
CFLAGS=-mfloat=hard

The tough part is that this assumes your conan recipes and/or build system pay attention to the CFLAGS environment variable and use it appropriately. That isn't always the case, but it is quite frequently.

lasote commented 6 years ago

Thanks for your feedback. You are right about the profile and the flags, but the goal is to standardize a way to generate different binary packages for the same recipe based on different micro-architectures. The problem with only the flags is that you will be overwriting the same package in the cache if you run in twice. It has to be modeled, as a setting or as an option. We have a similar problem with the c++ standard version, we are still evaluating it.

DavidZemon commented 6 years ago

The problem with only the flags is that you will be overwriting the same package in the cache if you run in twice.

Are you saying that when I set environment variables in a profile, those environment variables do not affect the hash of any packages built with that profile?

lasote commented 6 years ago

Are you saying that when I set environment variables in a profile, those environment variables do not affect the hash of any packages built with that profile?

Yes, exactly that. The only things that affect a package ID are the settings, the options and the requirements of the package.

DavidZemon commented 6 years ago

I've been reading, re-reading, and triple-re-reading this thread. I tried looking at the referenced PR (#2042) but only understood a little of it. As always, so much more complex than I initially imagined it would be.

I think we all agree that the theoretical best solution is to make Conan aware of specific instruction sets. However, @lasote might be right when he said

... it makes unmaintainable and crazy for the user. The combination is quite infinite. So it looks it couldn't be a setting.

But the other solutions are not ideal either. I would push for removal of the 1.1 milestone target and leave it blank. With no promise of a deadline, we can then all work together to come up with some kind of a solution (even if it's a major breaking API and doesn't come out until Conan v2 or v3) that minimizes the burden on the 90% of users that don't care but allows the 10% of us that do care very much to specify exactly which instruction sets should be enabled.

The C/C++ world still does not have a top-notch package manager solution, and I don't think any solution which fails to properly tackle this specific problem will ever gain the kind of popularity that Maven, NPM, and Pip have gathered. So, before Conan tries to take over the world, I think it should solve this problem in the absolute best possible way.

Solving this problem won't be easy. I think it will require that Conan is capable of mapping instruction sets to compiler flags. This will take a lot of research to provide by default a useful portion of this mapping for as many different compilers and instruction sets as possible. The mapping will also need to be user-extendable, just like the rest of settings.yml (though the mapping may reside in a different file).

I think the burden on the end-user could be eased by providing optional "families", such as i386 which would encompass a large group of instruction sets. The family i786 would then reference i686 and append SSE2 and SSE3.

It may be worth adding a boolean to the profile that says "I will accept binaries that are compatible with my CPU but do not fully utilize all of its features," or "I want to recompile any package that does not fully utilize my CPU." This would also require having a list of conflicting instruction sets or families (to prevent trying to cross-link 32- and 64-bit binaries).

anton-danielsson commented 3 years ago

Having something like fpu or microarch in settings.yml would be really cool. The difference between in code compiled with for example neon vs vfp or SSE vs AVX_512 can be quite large.

jwillikers commented 2 years ago

Yeah, I think supporting the various CPU architectures out there is absolutely necessary for Conan to adequately cover binary compatibility. It seems like there should be sub-settings to cover the intricacies of specific architectures, like the FPU for armv7hf and whether or not to use thumb in this case, which ends up being a distinct ISA.

vadixidav commented 2 years ago

I also needed to add things to my settings.yaml for embedded arm:

    arm-none-eabi-gcc:
        version: ["8.3"]
        cpu: ["cortex-m7"]
        fpu: ["fpv5-sp-d16"]
        float-abi: ["soft", "softfp", "hard"]

I don't really know if compiler is the right place to put this, but it didn't look like it would have been easy to stick it into arch either, as none of those have sub options. It might be that some of this is redundant with the arch as well, as the arch is armv7hf. However, even though the arch is armv7hf, it is still possible to build libraries using the soft ABI and even to consume them with softfp, so I don't know what the right answer is here.

I would like it if there was a more comprehensive solution for handling of the floating point here on arm, as I need to be diligent to ensure that I set the settings in accordance with the compiler flags. As you can see, I have been careful to name the settings exactly as the compiler flags to avoid problems and simplify my profile file.

In general, it seems like every capability or option on a CPU should probably have its own Conan setting if this is to be managed.

jwillikers commented 2 years ago

@vadixidav That looks a lot like what I've been thinking, though I've been leaning towards putting it under arch instead of the compiler, even though it is a bit awkward that there aren't any sub-options there and it looks like the architectures are just a list. It seems like the plan might be to convert the architecture settings in arch to the necessary compiler flags in the build-system generators. It's interesting that you created a GNU Arm Embedded compiler, too. I've thought about doing so but for simplicity I've just used the gcc compiler setting and set up the compiler name in the CMakeToolchain generator.

A rough example taking from yours might look as follows.

arch:
  thumbv7em:
    fpu: [None, "fpv5-sp-d16"]
    float-abi: ["soft", "softfp", "hard"]

This is going to get tricky pretty quickly for validating architecture options don't conflict, like selecting fpu: None and float-abi: "hard". I like how Rust lays out their Platform Support and the consistent naming there, though I don't think that naming captures all the available options one might one to configure for a specific architecture.

conan-io / conan

Pivoting on target-cpu/arch #847