AcademySoftwareFoundation / rez

An integrated package configuration, build and deployment system for software
https://rez.readthedocs.io
Apache License 2.0
942 stars 335 forks source link

Supporting Multiple Compilers #42

Open mstreatfield opened 10 years ago

mstreatfield commented 10 years ago

We are starting to build packages with both ICC and GCC variants. As ICC requires GCC at run time (and I think build time), my icc package looks something like this:

name: icc
version: 14.0.0
variants:
    - [ CentOS-6.2, gcc-4.1.2 ]
    - [ CentOS-6.2, gcc-4.4.6 ]
commands:
    - CXX=!ROOT!/bin/icpc
    - CC=!ROOT!/bin/icc

To build all ICC and GCC variants of a higher level package, we must do something like this:

name: foo
version: 1.0.0
variants:
    - [ CentOS-6.2, gcc-4.1.2 ]
    - [ CentOS-6.2, gcc-4.4.6 ]
    - [ CentOS-6.2, icc-14.0.0, gcc-4.1.2 ]
    - [ CentOS-6.2, icc-14.0.0, gcc-4.4.6 ]

This creates three problems:

  1. The icc package redefines the CXX and CC environment variables, which are also defined in the gcc package. Rez does not allow this (it's a conflict) and so temporarily I've commented this restriction out. I understand the restriction, but maybe we can loosen it a little?
  2. The foo package now has unbalanced variants. This doesn't appear to be a problem (it works) but is maybe not considered very neat. This would perhaps benefit from naming the variant folders as we discussed in another thread.
  3. In theory we are able to successfully mix and match compilers are runtime - they are not mutually exclusive. Assuming another package bah with the same variants as foo we can build an environment using the gcc-4.4.6 variant of foo and the icc-14.0.0, gcc-4.4.6 variant of bah.

    Currently (I believe) a call rez-env foo bah will pick the first variant that creates a resolution (both GCC), rez-env foo bah icc-14.0.0 gcc-4.4.6 will resolve for both packages to use the ICC variant. Is there currently some syntax I can use to be more specific, e.g. rez-env foo:gcc-4.4.6 bah:icc-14.0.0?

    Perhaps this is an extension of what was discussed in issue 21.

To extend point 1) a little further with another use case:

We want to define a PYTHON_EXE environment variable which points to the current python interpreter for the resolved environment. For example:

name: maya
requires:
    -python-2.6
commands:
    - export PYTHON_EXE=mayapy

name: python
version: 2.6.6
commands:
    - export PYTHON_EXE=python2.6

As both the python and maya package define PYTHON_EXE we get a conflict. The way around this I see would be to define the python-2.6 requirement of maya as a separate mayapy package.

name: maya
requires:
    -mayapy-2.6

name: mayapy
version: 2.6.6
commands:
    - export PYTHON_EXE=mayapy

But our python applications have three variants for python 2.5, 2.6 and 2.7 and for applications which are pure python, they work in a normal python interpreter, as well as Maya/Nuke/Houdini's. To follow this approach we'd have to build many more variants of essentially the same code which seems wasteful:

name: foo
variants:
    - [ python-2.5 ]
    - [ python-2.6 ]
    - [ python-2.7 ]
    - [ mayapy-2.6 ]
    - [ hython-2.6 ]
    - [ nukepy-2.6 ]

I don't like this approach. Any suggestions?

nerdvegas commented 10 years ago

Good questions, see below...

On Wed, Nov 27, 2013 at 2:39 PM, Mark Streatfield notifications@github.comwrote:

We are starting to build packages with both ICC and GCC variants. As ICC requires GCC at run time (and I think build time), my icc package looks something like this:

name: icc version: 14.0.0 variants:

  • [ CentOS-6.2, gcc-4.1.2 ]
  • [ CentOS-6.2, gcc-4.4.6 ] commands:
  • CXX=!ROOT!/bin/icpc
  • CC=!ROOT!/bin/icc

To build all ICC and GCC variants of a higher level package, we must do something like this:

name: foo version: 1.0.0 variants:

  • [ CentOS-6.2, gcc-4.1.2 ]
  • [ CentOS-6.2, gcc-4.4.6 ]
  • [ CentOS-6.2, icc-14.0.0, gcc-4.1.2 ]
  • [ CentOS-6.2, icc-14.0.0, gcc-4.4.6 ]

This creates three problems:

1.

The icc package redefines the CXX and CC environment variables, which are also defined in the gcc package. Rez does not allow this (it's a conflict) and so temporarily I've commented this restriction out. I understand the restriction, but maybe we can loosen it a little?

I see the issue, and yes this seems to be a legitimate case. I'd suggest 3 approaches, from most immediate fix, to long term:

1) add an '!OVERWRITE! special Rez var that can be used to trigger the suppression of var conflict detection just for that command: commands:

2) When we move to the python-based, OS-agnostic command support (codename Rex), provide an 'overwrite' function for the same purpose;

3) Extend this overwrite rex function so that you can also specify the package(s) you expect to be overwriting the value from. For example, in your case you might do this:

commands: overwrite('PYTHON_EXE', 'mayapy', friends=["python"])

So here, the overwrite command will only work if either PYTHON_EXE was not previously set, or was set by the 'python' package. This I think would add some extra protection against accidental overwrites whilst allowing you to do it when you need to.

1.

The foo package now has unbalanced variants. This doesn't appear to be a problem (it works) but is maybe not considered very neat. This would perhaps benefit from naming the variant folders as we discussed in another threadhttps://github.com/nerdvegas/rez/issues/21#issuecomment-19083476 .

There's another problem lurking here - your variants are not mutually exclusive. There may be some cases where your resolve will not pick an icc variant, event though icc ends up in the resolve. For example, if CentOS-6.2 and gcc-4.1.2 get resolved early on, then the first foo variant will be chosen, even though icc gets pulled in by some other package later on in the resolve. This could end up causing pretty insidious resolution bugs! I * think* anyway... there might be subtleties in the resolve algorithm that get around this, I'd need to delve back into the code. I might be second guessing myself.

Anyway that aside, you could force mutual exclusion like so:

variants:

But you're right, this isn't very neat and makes for a pretty odd package variant installation path. This does come back to the need to be able to name variant folders, though I think we haven't really hashed out how this should work yet. Again, a separation conversation needs to be had about that.

1.

In theory we are able to successfully mix and match compilers are runtime - they are not mutually exclusive. Assuming another package bahwith the same variants as foo we can build an environment using the gcc-4.4.6 variant of foo and the icc-14.0.0, gcc-4.4.6 variant of bah.

Currently (I believe) a call rez-env foo bah will pick the first variant that creates a resolution (both GCC), rez-env foo bah icc-14.0.0 gcc-4.4.6 will resolve for both packages to use the ICC variant. Is there currently some syntax I can use to be more specific, e.g. rez-env foo:gcc-4.4.6 bah:icc-14.0.0?

You're right, but iirc it's a little more involved - Rez will choose the first variant that creates a resolution yes, but if more than one variant is fully resolved, it will choose the one with more packages. That is the only reason why the icc variants in your example could ever be chosen.

In any case, you are correct - you can't currently get the behaviour you've described above. I agree that a syntax like the one you have described could be the way to go. I say 'could' though, because here is another case where 'features' would come to the rescue. You could have a 'has_icc' feature in foo, and then your resolve might look something like this:

rez-env foo !foo.has_icc bah bah.has_icc

Let us assume in this example that dot notation is how you go about describing features, which I think is probably a good notation anyway. Let us also assume that per-variant features are possible, which again I think we should have (as well as per-variant anything else, as I've stated before).

1.

Perhaps this is an extension of what was discussed in issue 21https://github.com/nerdvegas/rez/issues/21 .

To extend point 1) a little further with another use case:

We want to define a PYTHON_EXE environment variable which points to the current python interpreter for the resolved environment. For example:

name: maya requires: -python-2.6 commands:

  • export PYTHON_EXE=mayapy

name: python version: 2.6.6 commands:

  • export PYTHON_EXE=python2.6

As both the python and maya package define PYTHON_EXE we get a conflict. The way around this I see would be to define the python-2.6 requirement of maya as a separate mayapy package.

name: maya requires: -mayapy-2.6

name: mayapy version: 2.6.6 commands:

  • export PYTHON_EXE=mayapy

But our python applications have three variants for python 2.5, 2.6 and 2.7 and for applications which are pure python, they work in a normal python interpreter, as well as Maya/Nuke/Houdini's. To follow this approach we'd have to build many more variants of essentially the same code which seems wasteful:

name: foo variants:

  • [ python-2.5 ]
  • [ python-2.6 ]
  • [ python-2.7 ]
  • [ mayapy-2.6 ]
  • [ hython-2.6 ]
  • [ nukepy-2.6 ]

I don't like this approach. Any suggestions?

Doesn't the overwrite idea talked about earlier get around this problem?

Wrt wasteful extra package installations: wrt disk space, isn't software tiny anyway? So, wasteful in what sense that might really matter? I suppose one thing we could do is add the ability in Rez to signify that one variant is equivalent to another (in terms of what is installed), which would then cause rez-build to install it as a symlink. So, nukepy-2.6 might become a symlink pointing at the python-2.6 subdir, for example.

— Reply to this email directly or view it on GitHubhttps://github.com/nerdvegas/rez/issues/42 .

mstreatfield commented 10 years ago

codename Rex

Ha. Interesting. My fat finger fumbles mean I end up typing Rex more times than Rez normally. I have an alias from Rex to Rez!

I see the issue, and yes this seems to be a legitimate case. I'd suggest 3 approaches, from most immediate fix, to long term:

I like 2 and 3, but will look at implementing 1 in the short term.

For example, if CentOS-6.2 and gcc-4.1.2 get resolved early on, then the first foo variant will be chosen, even though icc gets pulled in by some other package later on in the resolve.

Ah yes, ok. That makes sense I think.

Using the !icc package in the variant works. However rez env foo is resolving the icc variant first still - I guess because it does have more packages.

I say 'could' though, because here is another case where 'features' would come to the rescue.

Do we need to/can we start a thread on this once the current changes are merged?

Doesn't the overwrite idea talked about earlier get around this problem?

Yes, it does. I was just providing it as another use case. I thought it might be a harder sell ;-)

Wrt wasteful extra package installations

In theory yes, the extra installations aren't a problem. In practise we have some poorly modularised software were a single release is 0.5+GB. Restructuring this is a priority, yes, but if we had to release that to cater for more variants it would become an issue.

nerdvegas commented 10 years ago

On Thu, Nov 28, 2013 at 2:54 PM, Mark Streatfield notifications@github.comwrote:

codename Rex

Ha. Interesting. My fat finger fumbles mean I end up typing Rex more times than Rez normally. I have an alias from Rex to Rez!

I see the issue, and yes this seems to be a legitimate case. I'd suggest 3 approaches, from most immediate fix, to long term:

I like 2 and 3, but will look at implementing 1 in the short term.

For example, if CentOS-6.2 and gcc-4.1.2 get resolved early on, then the first foo variant will be chosen, even though icc gets pulled in by some other package later on in the resolve.

Ah yes, ok. That makes sense I think.

Using the !icc package in the variant works. However rez env foo is resolving the icc variant first still - I guess because it does have more packages.

Hmm I wouldn't expect that - given an arbitrary choice between totally unresolved variants of the same package count, the first variant should take precedence. Will have to look at this.

I say 'could' though, because here is another case where 'features' would come to the rescue.

Do we need to/can we start a thread on this once the current changes are merged?

Yes for sure, I'm keen to introduce this feature.

Doesn't the overwrite idea talked about earlier get around this problem?

Yes, it does. I was just providing it as another use case. I thought it might be a harder sell ;-)

Wrt wasteful extra package installations

In theory yes, the extra installations aren't a problem. In practise we have some poorly modularised software were a single release is 0.5+GB. Restructuring this is a priority, yes, but if we had to release that to cater for more variants it would become an issue.

Ah I see. Then I think it should be possible to implement the symlink idea. Those variants that just link to another would not actually be built - instead just their environments would be resolved, to check that the variant is possible to produce at all.

— Reply to this email directly or view it on GitHubhttps://github.com/nerdvegas/rez/issues/42#issuecomment-29488355 .

mstreatfield commented 10 years ago

We have changed this problem slightly by making the compilers gcc and icc mutually exclusive, extracting a stdlib package that is required by both.

We still have unbalanced variants, but felt this was preferable to including !icc in the variant string. But this does remove the current issue of conflicting environment variables (each can now safely define CC and CXX without conflict).

We are no longer have the potential to mix and match compilers at runtime, but this is proving to have other issues which mean it's not desirable anyway.

Mark.