Easier cross-compiling for level 4?

stuarteberg commented 3 months ago

Comment:

The conda-forge docs for the microarch-optimized builds have an example that uses microarch_level: 4. But the README for this feedstock contains the following caveat:

When building packages on CI, level=4 will not be guaranteed, so you can only use level<=3 to build.

Indeed, when I tried to use level 4, I saw failures (in my case, it was on osx).

Nonetheless, I'd like to produce optimized builds for machines that support AVX-512 (level 4). This was possible by explicitly adding the necessary build flag in build.sh and then explicitly listing the appropriate run dependency:

# conda_build_config.yaml
microarch_level:
  - 1
  - 3  # [unix and x86_64]
  - 4  # [unix and x86_64]

# build.sh
if [[ "${microarch_level}" == "4" ]]; then
    CXXFLAGS="${CXXFLAGS} -march=x86-64-v4"
fi

# meta.yaml
requirements:
  run:
    - _x86_64-microarch-level 4  # [unix and x86_64 and microarch_level == 4]

Using that workaround, we were able to produce optimized binaries (including march=x86-64-v4 in the graph-tool feedstock (https://github.com/conda-forge/graph-tool-feedstock/pull/140).

Would it be possible to make that easier for feedstock maintainers, perhaps by having the microarch-level-feedstock produce yet another output?

Right now this feedstock produces two packages for each arch, such as:

x86_64-microarch-level a. Introduces the -march=x86-64-v${level} flag in CFLAGS etc. b. Introduces a run_export to _x86_64-microarch-level
_x86_64-microarch-level a. Introduces a run dependency to the appropriate __archspec virtual package.

...but it seems like cross-compilation would be easier if we were to split up the functionality from 1.a and 1.b. into two separate packages, so we could easily obtain the correct CFLAGS without pulling in the __archspec dependency. Perhaps we could offer two variants of the package: one that provides both 1.a and 1.b, and another variant that only provides 1.a. (I'm just splitballing here...)

Alternatively, we could just drop the run_exports from the {{ family }}-microarch-level recipe. In that case, feedstock maintainers could build level-4 packages without needing to add the compiler flag explicitly, but they would be forced to explicitly list the appropriate runtime dependency in their recipe, which could be annoying:

requirements:
  build:
    - x86_64-microarch-level {{ microarch_level }}  # [unix and x86_64]
    - ppc64le-microarch-level {{ microarch_level }}  # [unix and ppc64le]
  run:
    - _x86_64-microarch-level >={{ microarch_level }} # [unix and x86_64]
    - _ppc64le-microarch-level >={{ microarch_level }} # [unix and ppc64le]

traversaro commented 3 months ago

This is probably related to the discussion in https://github.com/conda-forge/conda-forge.github.io/issues/1261 .

isuruf commented 3 months ago

This is a deficiency of run_exports where strong run_exports in build -> host & run and we have no way of specifying build -> run only. I suggest doing ignore_run_exports_from and manually adding them in run.

baszalmstra commented 1 month ago

Is this now solved by #6 ?

baszalmstra commented 1 month ago

Is this now solved by #6 ?

To answer my own question: no.

Adding x86_64-microarch-level 4 to the build section still adds a _x86_64-microarch-level >=4 package to the host section.

So continuing with the idea from @stuarteberg, how about we do the following:

We introduce a new package in this feedstock whose only purpose is to have a strong run-export on _x86_64-microarch-level. Lets call this package _x86_64-microarch-level-run-export. (I'd love to hear a better name).
The run-export of x86_64-microarch-level is replaced with a weak run_export on this new package _x86_64-microarch-level-run-export. This ensures that the package is added to only the host section of the build.

@isuruf @traversaro @stuarteberg Thoughts?

On another note, currently the x86_64-microarch-level is also created for every microarchitecture, but it seems to me that it is completely unrelated. Should we maybe just remove that?

baszalmstra commented 1 month ago

I just went ahead and implemented my note from above: https://github.com/conda-forge/microarch-level-feedstock/pull/9

Let me know what you think!

isuruf commented 1 month ago

We introduce a new package in this feedstock whose only purpose is to have a strong run-export on _x86_64-microarch-level. Lets call this package _x86_64-microarch-level-run-export. (I'd love to hear a better name).

And what's the difference between that and the current x86_64-microarch-level?

baszalmstra commented 1 month ago

Im trying to create a method where a package in the build section adds a package to just the run section (so not the host section). We can do that through 3 different packages:

x86_64-microarch-level: (should be placed in the build section)
- Adds the activation scripts to set the proper compiler flags.
- Adds a weak runexport on _x86_64-microarch-level-run-export
_x86_64-microarch-level-run-export: (automatically placed in the host section if x86_64-microarch-level is placed in build).
- Adds a weak runexport on _x86_64-microarch-level
_x86_64-microarch-level: automatically added by _x86_64-microarch-level-run-export to the run section
- Adds a requirement on a specific __archspec.

As you can see, all three packages have a different responsibility.

The result is that one can just add x86_64-microarch-level to the build section, which will allow building for an architecture that the build machine does not support, while also automatically adding a run requirement that enforces the proper archspec.

isuruf commented 1 month ago

host environment getting the virtual packages from build machine is the root issue and all the solutions thus far are hacks IMO. Also your suggestion doesn't work in the case where a v4 package needs a dependency v4 package in host. Then we have the same issue.

I think the best solution here is a way to specify virtual packages to be added to host environments.

baszalmstra commented 3 weeks ago

I think the best solution here is a way to specify virtual packages to be added to host environments.

I have previously been discussing that with @wolfv . That does indeed seem like the best solution.

For the time being however, maybe we can brainstorm a solution that works in the mean time?

Also your suggestion doesn't work in the case where a v4 package needs a dependency v4 package in host. Then we have the same issue.

Would adding a constraint from _x86_64-microarch-level-run-export on _x86_64-microarch-level fix that?

baszalmstra commented 2 weeks ago

@isuruf Any thoughts? I can create a PR if that helps?

isuruf commented 2 weeks ago

For now can you use ignore_run_exports? That seems better than the hacky solution you suggested.

baszalmstra commented 2 weeks ago

For now can you use ignore_run_exports?

But that will require the user to have a deep understanding of what is going on and why this is a problem. The whole idea is to remove friction for the user. My workaround is not that hacky is it?

isuruf commented 2 weeks ago

But that will require the user to have a deep understanding of what is going on and why this is a problem.

What you are suggesting needs a deep understanding of what is going on when compiling downstream packages. The user needs to know that the ABI doesn't change when compiling a lvl4 package with a lvl3 downstream package, but running with a downstream lvl4 package.

conda-forge / microarch-level-feedstock

Easier cross-compiling for level 4? #5

Comment: