Open ifitchet opened 1 year ago
conda-build is supposed to use miniconda3\envs\b1\conda-bld\jq-suite_1695720509023\_h_env\Library\usr\bin\bash.exe
. Is it not?
conda-build is supposed to use
miniconda3\envs\b1\conda-bld\jq-suite_1695720509023\_h_env\Library\usr\bin\bash.exe
. Is it not?
Yes, that is my contention. Of interest, with a broken patch I could see it was also running patch
from the base environment as well.
So I think it is "as simple as" a PATH issue.
As a test you could:
conda create -n b1 conda-build
conda activate b1
conda build <recipe>
and it won't find bash
(or patch
) or eventually perl
when running because it's looking for them in b1 and not in the target build environment where the recipe's requirements.build
has (in MSYS2-land) m2-bash
, m2-patch
and m2-perl
.
However, if conda-build
has been using the wrong PATH for some time, do we have anything that is now reliant on that wrong PATH?
The base environment contains quite a lot of stuff (including bash
and patch
) which may well have been masking the problem for some time. In this particular case jq
wants to run autoreconf
which requires the full autotools
suite which isn't in base but is in the target build environment.
I'll take the opportunity of reminding readers that this is incredibly easy to miss when testing. If you run an executable from the target build environment, ie. off the PATH, the right thing will happen. You must ask yourself, what does this shell see: cat /proc/mounts
is not the same as /usr/bin/cat /proc/mounts
as cat
is an executable. You could use:
while read line ; do
echo "$line"
done < /proc/mounts
to get this bash
to read to contents of the file in this root filesystem. The explicit use of /usr/bin/cat
mirrors the explicit interpreter name in #! /usr/bin/perl
and has this bash
(technically, execve(2)) run the perl
from this root filesystem.
Checklist
What happened?
Apologies, mermaid has re-imagined the diagram, start in the bottom right and run counter-clockwise.
The conda build PATH Problem
Background
The MSYS2 runtime is a set of patches to the Cygwin runtime. The "magic" of the Cygwin/MSYS2 runtime is that it dynamically remaps the idea of
/
based on where the runtime DLL is phsyically in the filesystem.The actual mapping is created here although the underlying path used is documented here.
This is extraordinary. Every executable, when it runs is potentially persuaded of a different
/
to any other running executable. A bit like a super-dynamic chroot(2).We can see that in action where I'm in a
b1
conda environment and can poke a broken build:Look at that! Two different physical directories pretending to be
/
. This is critical to understanding what is going to be happening next.As an additional feature, the Windows DLL search order comes into play.
Here, #7 comes into play, Windows will use the
msys-2.0.dll
that it find in the same folder as the executable. Indeed, "/usr/bin/msys-2.0.dll
" is sat next to "/usr/bin/df.exe
" -- I've air-quoted them as clearly neither is the one true/
but within their own little worlds, they are next to each other.So when I run the executable,
df
, it finds anmsys-2.0.dll
next to to which makes it believe that/
is two directories up.The Actual Error
Here's what happens when
Huh.
* Actually, you're more likely to see:
which I've just cut'n'paste from a vanilla build. I hacked my way round the missing
build-aux/config.*
andAutom4te/ChannelDefs.pm
to get to my/usr/share/aclocal-1.16
error above. It's all the same problem.What Goes Wrong
Well, it basically goes wrong right from the start, gets away with it for a bit then crashes and burns.
conda-build
has seen from jq-feedstock/recipe/meta.yaml that it need to usebash
as the script interpreter.The only
bash
it has is in its own environment, the one where the runtime points toC:\miniconda3\envs\b1\Library
.bash
now runs thebuild-jq.sh
script. But wait!build-jq.sh
looks like it wants to run things likeautoreconf
which are going to be in the target build's environment.It turns out that for many commands this isn't an issue. The target build's environment will have been set on the PATH by activate.py and, because of the runtime trick we've been looking at, any executable that is found on the PATH in the target build's environment will do the right thing because it picks up the adjacent
msys-2.0.dll
and it all magics out.However, what about scripts? You may now realise I've explicitly been calling out executables above.
Here we need to know a little about Unix launching commands. In particular, this is
conda-build
'sbash
that is calling execve(2).Broadly based on the magic number of the file, execve() can decide what to do.
Scripts generally start with the magic number
#!
and then the name of the interpreter. In our particular case the script has been found on the PATH inC:\miniconda2\envs\b1\conda-bld\jq-suite_nnnn\_h_env\Library\usr\bin\aclocal-1.16
and the interpreter of that script is#! /usr/bin/perl
.OK, execve() is going to run the interpreter which is in
/usr/bin/perl
which means it is going to access the file in its own / filesystem. Wait! execve() is running inbash
which is running inconda-build
's environment.At the moment, maybe we don't care that we running this
/usr/bin/perl
or that/usr/bin/perl
but it's about to bite us./usr/bin/perl
or,aclocal-1.16
as we're more familiar with it, is going to try to access the expected/usr/share/aclocal-1.16
because that was installed in the target build's environment but the actual running instance ofperl
is bound toconda-build
's environment which doesn't have any of the autotools stuff installed in its/
.Doh!
Is It Even Worse Than That?
Here's some more debugging output where I explicitly run, say,
df
from the target build's environment:Yikes!
Are we in some tricksome world where once one runtime in a "thread of control" has been established then you don't get to load another? The answer is both yes and no.
We can demonstrate the fun and games by having a script which runs the explicit pathnames of three instances of
df
and then run the script with the three instances ofbash
:Wait, what?
Looking more closely you can see the following:
bash
from thebase
environment, all threedf
s do the right thingbash
from either non-base
environment then:df
from thebase
environment does the right thingdf
s from the non-base
environments are bound to whichever environment thebash
was run fromOh dear.
Mitigations
I suspect that most people would grumble about Windows then
and find that it magically just works! There'd be more grumbling about Windows and everyone would move on.
Of course it just works. By sourcing
build_env_setup.bat
you are putting yourself in the target build's environment after which, runningbash
is no problem because you're now getting the correct bash (the one from the target build environment) and when a script wants to run/usr/bin/perl
execve(2) will access and run the target build's environment's instance of perl and... You get the picture, everything now lines up.What Should Be Happening?
Clearly, in the face of MSYS2/Cygwin's simulated root filesystem,
conda build
should be launchingbash
(in this case) from the target build's environment. Broadly,%BUILD_PREFIX%\Library\usr\bin\bash
although you'd like to think that post-activate.py
you could have just invokedbash
and picked up the one from the target build.That's an ordering issue which I don't have any insight on.
What Else Could Be Happening?
There is a question of whether this affects non-MSYS2 setups.
It's harder to see but I don't think so. Unix systems are generally more conservative and have a single root filesystem. That leads to a proliferation of
${BUILD_PREFIX}/...
uses to ensure that the target build's enviroment is in use but Unix won't suffer from the on-the-fly root filesystem problem causing execve(2) to behave unexpectedly.Conda Info
Conda Config
Conda list
Additional Context
No response