conda / conda-build

Commands and tools for building conda packages
https://docs.conda.io/projects/conda-build/
Other
381 stars 423 forks source link

conda build PATH problem #5028

Open ifitchet opened 1 year ago

ifitchet commented 1 year ago

Checklist

What happened?

Apologies, mermaid has re-imagined the diagram, start in the bottom right and run counter-clockwise.

flowchart TD
    subgraph "conda-build's env"
      subgraph "conda-build's packages"
        CB1["m2-bash
        m2-perl"] -->|.dll| CBR
        CBR["m2-msys2-runtime"]
      end
      subgraph "conda-build's root filesystem"
        CBR  --> CBroot
        CBroot["C:\miniconda3\envs\b1\Library"]
      end
    end
    subgraph "jq-feedstock"
      subgraph "meta.yaml"
        MY -->|bash,perl is getting| CB1
        MY["script: build-jq.sh
            script_interpreter: bash"]
      end
      subgraph "build-jq.sh"
        BJQ["#! /bin/bash
        autoreconf -vfi"]
      end
    end
    subgraph "target build env"
      subgraph "target build's packages"
        BJQ -->|build-jq.sh is expecting| TB1
        TB1["m2-make
        m2-bash
        m2-autoconf
        m2w64-toolchain"] -->|.dll| TBR
        TBR["m2-msys2-runtime"]
      end
      subgraph "target build's root filesystem"
        TBR  --> TBroot
        TBroot["C:\miniconda3\conda-bld\jq-suite_nnnn\_h_env\Library"]
      end
    end

The conda build PATH Problem

Background

The MSYS2 runtime is a set of patches to the Cygwin runtime. The "magic" of the Cygwin/MSYS2 runtime is that it dynamically remaps the idea of / based on where the runtime DLL is phsyically in the filesystem.

The actual mapping is created here although the underlying path used is documented here.

This is extraordinary. Every executable, when it runs is potentially persuaded of a different / to any other running executable. A bit like a super-dynamic chroot(2).

We can see that in action where I'm in a b1 conda environment and can poke a broken build:

(b1) C:\Users\dev-admin\src\aggregate>df -h
Filesystem                     Size  Used Avail Use% Mounted on
C:/miniconda3/envs/b1/Library  512G   43G  470G   9% /

(b1) C:\Users\dev-admin\src\aggregate>\miniconda3\envs\b1\conda-bld\jq-suite_1695720509023\_h_env\Library\usr\bin\df -h
Filesystem                                                             Size  Used Avail Use% Mounted on
C:/miniconda3/envs/b1/conda-bld/jq-suite_1695720509023/_h_env/Library  512G   43G  470G   9% /

Look at that! Two different physical directories pretending to be /. This is critical to understanding what is going to be happening next.

As an additional feature, the Windows DLL search order comes into play.

Here, #7 comes into play, Windows will use the msys-2.0.dll that it find in the same folder as the executable. Indeed, "/usr/bin/msys-2.0.dll" is sat next to "/usr/bin/df.exe" -- I've air-quoted them as clearly neither is the one true / but within their own little worlds, they are next to each other.

So when I run the executable, df, it finds an msys-2.0.dll next to to which makes it believe that / is two directories up.

The Actual Error

Here's what happens when

(b1) C:\Users\dev-admin\src\aggregate>conda build jq-feedstock
...
...installs autotools etc....
...
aclocal-1.16: error: couldn't open directory '/usr/share/aclocal-1.16': No such file or directory
autoreconf-2.71: error: aclocal failed with exit status: 1

Huh.

* Actually, you're more likely to see:

...
Packaging jq-1.6-haa95532_1
ig.*: cannot stat 'C:\miniconda3\envs\b1\conda-bld\jq-suite_1695732009123\_h_env/share/libtool/build-aux/config.*': No such file or directory
/c/miniconda3/envs/b1/conda-bld/jq-suite_1695732009123/work/modules/oniguruma /c/miniconda3/envs/b1/conda-bld/jq-suite_1695732009123/work
Can't locate Autom4te/ChannelDefs.pm in @INC (you may need to install the Autom4te::ChannelDefs module) (@INC contains: /usr/share/autoconf /usr/lib/perl5/site_perl /usr/share/perl5/site_perl /usr/lib/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib/perl5/core_perl /usr/share/perl5/core_perl) at /c/miniconda3/envs/b1/conda-bld/jq-suite_1695732009123/_h_env/Library/usr/bin/autoreconf line 39.
...

which I've just cut'n'paste from a vanilla build. I hacked my way round the missing build-aux/config.* and Autom4te/ChannelDefs.pm to get to my /usr/share/aclocal-1.16 error above. It's all the same problem.

What Goes Wrong

Well, it basically goes wrong right from the start, gets away with it for a bit then crashes and burns.

conda-build has seen from jq-feedstock/recipe/meta.yaml that it need to use bash as the script interpreter.

The only bash it has is in its own environment, the one where the runtime points to C:\miniconda3\envs\b1\Library.

bash now runs the build-jq.sh script. But wait! build-jq.sh looks like it wants to run things like autoreconf which are going to be in the target build's environment.

It turns out that for many commands this isn't an issue. The target build's environment will have been set on the PATH by activate.py and, because of the runtime trick we've been looking at, any executable that is found on the PATH in the target build's environment will do the right thing because it picks up the adjacent msys-2.0.dll and it all magics out.

However, what about scripts? You may now realise I've explicitly been calling out executables above.

Here we need to know a little about Unix launching commands. In particular, this is conda-build's bash that is calling execve(2).

Broadly based on the magic number of the file, execve() can decide what to do.

Scripts generally start with the magic number #! and then the name of the interpreter. In our particular case the script has been found on the PATH in C:\miniconda2\envs\b1\conda-bld\jq-suite_nnnn\_h_env\Library\usr\bin\aclocal-1.16 and the interpreter of that script is #! /usr/bin/perl.

OK, execve() is going to run the interpreter which is in /usr/bin/perl which means it is going to access the file in its own / filesystem. Wait! execve() is running in bash which is running in conda-build's environment.

At the moment, maybe we don't care that we running this /usr/bin/perl or that /usr/bin/perl but it's about to bite us.

/usr/bin/perl or, aclocal-1.16 as we're more familiar with it, is going to try to access the expected /usr/share/aclocal-1.16 because that was installed in the target build's environment but the actual running instance of perl is bound to conda-build's environment which doesn't have any of the autotools stuff installed in its /.

Doh!

Is It Even Worse Than That?

Here's some more debugging output where I explicitly run, say, df from the target build's environment:

/c/miniconda3/envs/b1/conda-bld/jq-suite_1695807633659/_h_env/Library/usr/bin/df
Filesystem                     Size  Used Avail Use% Mounted on
C:/miniconda3/envs/b1/Library  512G   44G  469G   9% /

Yikes!

Are we in some tricksome world where once one runtime in a "thread of control" has been established then you don't get to load another? The answer is both yes and no.

We can demonstrate the fun and games by having a script which runs the explicit pathnames of three instances of df and then run the script with the three instances of bash:

(b1) C:\Users\dev-admin>cat df-test
/c/miniconda3/Library/usr/bin/df
/c/miniconda3/envs/b1/Library/usr/bin/df
/c/miniconda3/envs/b1/conda-bld/jq-suite_1695807633659/_h_env/Library/usr/bin/df

(b1) C:\Users\dev-admin>\miniconda3\Library\usr\bin\bash ./df-test
Filesystem            1K-blocks     Used Available Use% Mounted on
C:/miniconda3/Library 536868860 45576520 491292340   9% /
Filesystem                    1K-blocks     Used Available Use% Mounted on
C:/miniconda3/envs/b1/Library 536868860 45576520 491292340   9% /
Filesystem                                                            1K-blocks     Used Available Use% Mounted on
C:/miniconda3/envs/b1/conda-bld/jq-suite_1695807633659/_h_env/Library 536868860 45576520 491292340   9% /

(b1) C:\Users\dev-admin>\miniconda3\envs\b1\Library\usr\bin\bash ./df-test
Filesystem            1K-blocks     Used Available Use% Mounted on
C:/miniconda3/Library 536868860 45576584 491292276   9% /
Filesystem                    1K-blocks     Used Available Use% Mounted on
C:/miniconda3/envs/b1/Library 536868860 45576648 491292212   9% /
Filesystem                    1K-blocks     Used Available Use% Mounted on
C:/miniconda3/envs/b1/Library 536868860 45576648 491292212   9% /

(b1) C:\Users\dev-admin>\miniconda3\envs\b1\conda-bld\jq-suite_1695807633659\_h_env\Library\usr\bin\bash ./df-test
Filesystem            1K-blocks     Used Available Use% Mounted on
C:/miniconda3/Library 536868860 45576712 491292148   9% /
Filesystem                                                            1K-blocks     Used Available Use% Mounted on
C:/miniconda3/envs/b1/conda-bld/jq-suite_1695807633659/_h_env/Library 536868860 45576712 491292148   9% /
Filesystem                                                            1K-blocks     Used Available Use% Mounted on
C:/miniconda3/envs/b1/conda-bld/jq-suite_1695807633659/_h_env/Library 536868860 45576712 491292148   9% /

Wait, what?

Looking more closely you can see the following:

Oh dear.

Mitigations

I suspect that most people would grumble about Windows then

(b1) C:\Users\dev-admin\src\aggregate>cd \miniconda3\envs\b1\conda-bld\jq-suite_1695720509023\
(b1) C:\miniconda3\envs\b1\conda-bld\jq-suite_1695720509023>call build_env_setup.bat
(b1) C:\miniconda3\envs\b1\conda-bld\jq-suite_1695720509023>bash build-jq.sh

and find that it magically just works! There'd be more grumbling about Windows and everyone would move on.

Of course it just works. By sourcing build_env_setup.bat you are putting yourself in the target build's environment after which, running bash is no problem because you're now getting the correct bash (the one from the target build environment) and when a script wants to run /usr/bin/perl execve(2) will access and run the target build's environment's instance of perl and... You get the picture, everything now lines up.

What Should Be Happening?

Clearly, in the face of MSYS2/Cygwin's simulated root filesystem, conda build should be launching bash (in this case) from the target build's environment. Broadly, %BUILD_PREFIX%\Library\usr\bin\bash although you'd like to think that post-activate.py you could have just invoked bash and picked up the one from the target build.

That's an ordering issue which I don't have any insight on.

What Else Could Be Happening?

There is a question of whether this affects non-MSYS2 setups.

It's harder to see but I don't think so. Unix systems are generally more conservative and have a single root filesystem. That leads to a proliferation of ${BUILD_PREFIX}/... uses to ensure that the target build's enviroment is in use but Unix won't suffer from the on-the-fly root filesystem problem causing execve(2) to behave unexpectedly.

Conda Info

active environment : base
    active env location : C:\miniconda3
            shell level : 1
       user config file : C:\Users\dev-admin\.condarc
 populated config files : C:\miniconda3\.condarc
                          C:\Users\dev-admin\.condarc
          conda version : 23.9.0
    conda-build version : 3.27.0
         python version : 3.9.18.final.0
       virtual packages : __archspec=1=x86_64
                          __win=0=0
       base environment : C:\miniconda3  (read only)
      conda av data dir : C:\miniconda3\etc\conda
  conda av metadata url : None
           channel URLs : https://repo.anaconda.com/pkgs/main/win-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/win-64
                          https://repo.anaconda.com/pkgs/r/noarch
                          https://repo.anaconda.com/pkgs/msys2/win-64
                          https://repo.anaconda.com/pkgs/msys2/noarch
          package cache : C:\miniconda3\pkgs
                          C:\Users\dev-admin\.conda\pkgs
                          C:\Users\dev-admin\AppData\Local\conda\conda\pkgs
       envs directories : C:\Users\dev-admin\.conda\envs
                          C:\miniconda3\envs
                          C:\Users\dev-admin\AppData\Local\conda\conda\envs
               platform : win-64
             user-agent : conda/23.9.0 requests/2.31.0 CPython/3.9.18 Windows/10 Windows/10.0.17763 aau/0.4.2 c/KgeReVFj5ntufmZlEwjm5w s/urBz6svTszhRLGJ_ZjIL_g e/9no2GHo9GeuI_r09YFyllw
          administrator : False
             netrc file : None
           offline mode : False

Conda Config

==> C:\miniconda3\.condarc <==
add_pip_as_python_dependency: False
show_channel_urls: True

==> C:\Users\dev-admin\.condarc <==
report_errors: False
anaconda_upload: False

Conda list

# packages in environment at C:\miniconda3:
#
# Name                    Version                   Build  Channel
abs-sdk                   0.1.0+768.g1b8d1c81            py_0    file:///C:/prefect/sdk
aiobotocore               2.5.0            py39haa95532_0    defaults
aiohttp                   3.8.5            py39h2bbff1b_0    defaults
aioitertools              0.7.1              pyhd3eb1b0_0    defaults
aiosignal                 1.2.0              pyhd3eb1b0_0    defaults
anaconda-anon-usage       0.4.2            py39hfc23b7f_2    defaults
anaconda-client           1.12.1           py39haa95532_0    defaults
appdirs                   1.4.4              pyhd3eb1b0_0    defaults
asn1crypto                1.5.1            py39haa95532_0    defaults
async-timeout             4.0.2            py39haa95532_0    defaults
attrs                     23.1.0           py39haa95532_0    defaults
beautifulsoup4            4.12.2           py39haa95532_0    defaults
boltons                   23.0.0           py39haa95532_0    defaults
boto3                     1.26.76          py39haa95532_0    defaults
botocore                  1.29.76          py39haa95532_0    defaults
brotlipy                  0.7.0           py39h2bbff1b_1003    defaults
bzip2                     1.0.8                he774522_0    defaults
ca-certificates           2023.08.22           haa95532_0    defaults
certifi                   2023.7.22        py39haa95532_0    defaults
cffi                      1.15.1           py39h2bbff1b_3    defaults
chardet                   4.0.0           py39haa95532_1003    defaults
charset-normalizer        2.0.4              pyhd3eb1b0_0    defaults
click                     8.0.4            py39haa95532_0    defaults
cloudpickle               2.2.1            py39haa95532_0    defaults
clyent                    1.2.2            py39haa95532_1    defaults
colorama                  0.4.6            py39haa95532_0    defaults
conda                     23.9.0           py39haa95532_0    defaults
conda-build               3.27.0           py39haa95532_0    defaults
conda-index               0.3.0            py39haa95532_0    defaults
conda-package-handling    2.2.0            py39haa95532_0    defaults
conda-package-streaming   0.9.0            py39haa95532_0    defaults
console_shortcut          0.1.1                         4    defaults
croniter                  0.3.35                     py_0    defaults
cryptography              41.0.3           py39h89fc84f_0    defaults
dask-core                 2023.6.0         py39haa95532_0    defaults
datadog                   0.42.0             pyhd3eb1b0_0    defaults
defusedxml                0.7.1              pyhd3eb1b0_0    defaults
deprecated                1.2.13           py39haa95532_0    defaults
distributed               2023.6.0         py39haa95532_0    defaults
docker-py                 4.4.1            py39haa95532_5    defaults
docker-pycreds            0.4.0              pyhd3eb1b0_0    defaults
filelock                  3.9.0            py39haa95532_0    defaults
frozenlist                1.3.3            py39h2bbff1b_0    defaults
fsspec                    2023.9.2         py39haa95532_0    defaults
git                       2.40.1               haa95532_1    defaults
gitdb                     4.0.7              pyhd3eb1b0_0    defaults
gitpython                 3.1.30           py39haa95532_0    defaults
heapdict                  1.0.1              pyhd3eb1b0_0    defaults
idna                      3.4              py39haa95532_0    defaults
importlib-metadata        6.0.0            py39haa95532_0    defaults
importlib_resources       5.2.0              pyhd3eb1b0_1    defaults
jinja2                    3.1.2            py39haa95532_0    defaults
jmespath                  0.10.0             pyhd3eb1b0_0    defaults
jsonpatch                 1.32               pyhd3eb1b0_0    defaults
jsonpointer               2.1                pyhd3eb1b0_0    defaults
jsonschema                4.17.3           py39haa95532_0    defaults
jupyter_core              5.3.0            py39haa95532_0    defaults
libarchive                3.6.2                hb62f4d4_2    defaults
libiconv                  1.16                 h2bbff1b_2    defaults
liblief                   0.12.3               hd77b12b_0    defaults
libxml2                   2.10.4               h0ad7f3c_1    defaults
locket                    1.0.0            py39haa95532_0    defaults
lz4-c                     1.9.4                h2bbff1b_0    defaults
m2-msys2-runtime          2.5.0.17080.65c939c               3    defaults
m2-patch                  2.7.5                         2    defaults
markupsafe                2.1.1            py39h2bbff1b_0    defaults
marshmallow               3.19.0           py39haa95532_0    defaults
marshmallow-oneofschema   3.0.1            py39haa95532_0    defaults
menuinst                  1.4.19           py39h59b6b97_0    defaults
more-itertools            8.12.0             pyhd3eb1b0_0    defaults
msgpack-python            1.0.3            py39h59b6b97_0    defaults
msys2-conda-epoch         20160418                      1    defaults
multidict                 6.0.2            py39h2bbff1b_0    defaults
mypy_extensions           1.0.0            py39haa95532_0    defaults
natsort                   7.1.1              pyhd3eb1b0_0    defaults
nbformat                  5.9.2            py39haa95532_0    defaults
openssl                   3.0.11               h2bbff1b_2    defaults
packaging                 23.1             py39haa95532_0    defaults
partd                     1.4.0            py39haa95532_0    defaults
pendulum                  2.1.2              pyhd3eb1b0_1    defaults
pip                       23.2.1           py39haa95532_0    defaults
pkginfo                   1.9.6            py39haa95532_0    defaults
platformdirs              3.10.0           py39haa95532_0    defaults
pluggy                    1.0.0            py39haa95532_1    defaults
powershell_shortcut       0.0.1                         3    defaults
prefect                   1.4.0              pyhd3eb1b0_0    distro-tooling
prefect-cookbook          0+unknown                pypi_0    pypi
psutil                    5.9.0            py39h2bbff1b_0    defaults
py-lief                   0.12.3           py39hd77b12b_0    defaults
pycosat                   0.6.6            py39h2bbff1b_0    defaults
pycparser                 2.21               pyhd3eb1b0_0    defaults
pydantic                  1.10.12          py39h2bbff1b_1    defaults
pygithub                  1.55               pyhd3eb1b0_1    defaults
pyjwt                     2.4.0            py39haa95532_0    defaults
pynacl                    1.5.0            py39h8cc25b3_0    defaults
pyopenssl                 23.2.0           py39haa95532_0    defaults
pyparsing                 3.0.9            py39haa95532_0    defaults
pyrsistent                0.18.0           py39h196d8e1_0    defaults
pysocks                   1.7.1            py39haa95532_0    defaults
python                    3.9.18               h1aa4202_0    defaults
python-box                5.4.1              pyhd3eb1b0_0    distro-tooling
python-dateutil           2.8.2              pyhd3eb1b0_0    defaults
python-fastjsonschema     2.16.2           py39haa95532_0    defaults
python-json-logger        2.0.7            py39haa95532_0    defaults
python-libarchive-c       2.9                pyhd3eb1b0_1    defaults
python-lmdb               1.4.1            py39hd77b12b_0    defaults
python-slugify            5.0.2              pyhd3eb1b0_0    defaults
pytz                      2023.3.post1     py39haa95532_0    defaults
pytzdata                  2020.1             pyhd3eb1b0_0    defaults
pywin32                   305              py39h2bbff1b_0    defaults
pyyaml                    6.0              py39h2bbff1b_1    defaults
requests                  2.31.0           py39haa95532_0    defaults
requests-toolbelt         1.0.0            py39haa95532_0    defaults
ruamel.yaml               0.17.21          py39h2bbff1b_0    defaults
ruamel.yaml.clib          0.2.6            py39h2bbff1b_1    defaults
s3transfer                0.6.0            py39haa95532_0    defaults
setuptools                68.0.0           py39haa95532_0    defaults
six                       1.16.0             pyhd3eb1b0_1    defaults
slack-sdk                 3.19.5             pyhaa95532_0    distro-tooling
smmap                     4.0.0              pyhd3eb1b0_0    defaults
sortedcontainers          2.4.0              pyhd3eb1b0_0    defaults
soupsieve                 2.5              py39haa95532_0    defaults
sqlite                    3.41.2               h2bbff1b_0    defaults
tabulate                  0.8.10           py39haa95532_0    defaults
tblib                     1.7.0              pyhd3eb1b0_0    defaults
text-unidecode            1.3                pyhd3eb1b0_0    defaults
toml                      0.10.2             pyhd3eb1b0_0    defaults
tomli                     2.0.1            py39haa95532_0    defaults
toolz                     0.12.0           py39haa95532_0    defaults
tornado                   6.3.2            py39h2bbff1b_0    defaults
tqdm                      4.65.0           py39hd4e2768_0    defaults
traitlets                 5.7.1            py39haa95532_0    defaults
typing-extensions         4.7.1            py39haa95532_0    defaults
typing_extensions         4.7.1            py39haa95532_0    defaults
tzdata                    2023c                h04d1e81_0    defaults
unidecode                 1.2.0              pyhd3eb1b0_0    defaults
urllib3                   1.26.16          py39haa95532_0    defaults
vc                        14.2                 h21ff451_1    defaults
vs2015_runtime            14.27.29016          h5e58377_2    defaults
websocket-client          0.58.0           py39haa95532_4    defaults
wheel                     0.41.2           py39haa95532_0    defaults
win_inet_pton             1.1.0            py39haa95532_0    defaults
wrapt                     1.14.1           py39h2bbff1b_0    defaults
xz                        5.4.2                h8cc25b3_0    defaults
yaml                      0.2.5                he774522_0    defaults
yarl                      1.8.1            py39h2bbff1b_0    defaults
zict                      3.0.0            py39haa95532_0    defaults
zipp                      3.11.0           py39haa95532_0    defaults
zlib                      1.2.13               h8cc25b3_0    defaults
zstandard                 0.19.0           py39h2bbff1b_0    defaults
zstd                      1.5.5                hd43e919_0    defaults

Additional Context

No response

isuruf commented 1 year ago

conda-build is supposed to use miniconda3\envs\b1\conda-bld\jq-suite_1695720509023\_h_env\Library\usr\bin\bash.exe. Is it not?

ifitchet commented 1 year ago

conda-build is supposed to use miniconda3\envs\b1\conda-bld\jq-suite_1695720509023\_h_env\Library\usr\bin\bash.exe. Is it not?

Yes, that is my contention. Of interest, with a broken patch I could see it was also running patch from the base environment as well.

So I think it is "as simple as" a PATH issue.

As a test you could:

conda create -n b1 conda-build
conda activate b1
conda build <recipe>

and it won't find bash (or patch) or eventually perl when running because it's looking for them in b1 and not in the target build environment where the recipe's requirements.build has (in MSYS2-land) m2-bash, m2-patch and m2-perl.

However, if conda-build has been using the wrong PATH for some time, do we have anything that is now reliant on that wrong PATH?

The base environment contains quite a lot of stuff (including bash and patch) which may well have been masking the problem for some time. In this particular case jq wants to run autoreconf which requires the full autotools suite which isn't in base but is in the target build environment.


I'll take the opportunity of reminding readers that this is incredibly easy to miss when testing. If you run an executable from the target build environment, ie. off the PATH, the right thing will happen. You must ask yourself, what does this shell see: cat /proc/mounts is not the same as /usr/bin/cat /proc/mounts as cat is an executable. You could use:

while read line ; do
   echo "$line"
done < /proc/mounts

to get this bash to read to contents of the file in this root filesystem. The explicit use of /usr/bin/cat mirrors the explicit interpreter name in #! /usr/bin/perl and has this bash (technically, execve(2)) run the perl from this root filesystem.