conda-forge / ucx-split-feedstock

A conda-smithy repository for ucx-split.
BSD 3-Clause "New" or "Revised" License
0 stars 23 forks source link

Commit hash reported by "ucx_info -v" does not exist in UCX repository. #180

Closed SeyedMir closed 3 months ago

SeyedMir commented 5 months ago

Solution to issue cannot be found in the documentation.

Issue

After installing UCX conda package 1.16.0, I see the following from ucx_info -v

# Library version: 1.16.0
# Library path: /lustre/fsw/coreai_libraries_nccl/hmirsadeghi/miniforge3/envs/test/bin/../lib/libucs.so.0
# API headers version: 1.16.0
# Git branch '', revision 1ac44f2

However, I cannot find commit hash 1ac44f2 in the UCX repository. So, it is not possible to know which commit the build corresponds to.

Installed packages

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
libgcc-ng                 13.2.0               h77fa898_6    conda-forge
libgomp                   13.2.0               h77fa898_6    conda-forge
libnl                     3.9.0                hd590300_0    conda-forge
libstdcxx-ng              13.2.0               hc0a3c3a_6    conda-forge
rdma-core                 51.0                 hd3aeb46_0    conda-forge
ucx                       1.16.0               h209287a_3    conda-forge

### Environment info

```shell
environment : test
    active env location : /lustre/fsw/coreai_libraries_nccl/hmirsadeghi/miniforge3/envs/test
            shell level : 2
       user config file : /home/hmirsadeghi/.condarc
 populated config files : /lustre/fsw/coreai_libraries_nccl/hmirsadeghi/miniforge3/.condarc
          conda version : 23.11.0
    conda-build version : not installed
         python version : 3.10.13.final.0
                 solver : libmamba (default)
       virtual packages : __archspec=1=cascadelake
                          __conda=23.11.0=0
                          __glibc=2.35=0
                          __linux=5.15.0=0
                          __unix=0=0
       base environment : /lustre/fsw/coreai_libraries_nccl/hmirsadeghi/miniforge3  (writable)
      conda av data dir : /lustre/fsw/coreai_libraries_nccl/hmirsadeghi/miniforge3/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
          package cache : /lustre/fsw/coreai_libraries_nccl/hmirsadeghi/miniforge3/pkgs
                          /home/hmirsadeghi/.conda/pkgs
       envs directories : /lustre/fsw/coreai_libraries_nccl/hmirsadeghi/miniforge3/envs
                          /home/hmirsadeghi/.conda/envs
               platform : linux-64
             user-agent : conda/23.11.0 requests/2.31.0 CPython/3.10.13 Linux/5.15.0-87-generic ubuntu/22.04.3 glibc/2.35 solver/libmamba conda-libmamba-solver/23.12.0 libmambapy/1.5.5
                UID:GID : 2000827952:2000827952
             netrc file : None
           offline mode : False
jakirkham commented 5 months ago

Thanks Hessam! 🙏

Currently we download UCX artifacts with this URL (given the version set above)

https://github.com/conda-forge/ucx-split-feedstock/blob/590b6e0f87a75231b6e65f560de910c25b4ed89a/recipe/meta.yaml#L1

https://github.com/conda-forge/ucx-split-feedstock/blob/590b6e0f87a75231b6e65f560de910c25b4ed89a/recipe/meta.yaml#L6-L8


The UCX 1.16.0 release has the corresponding GH hash ( https://github.com/openucx/ucx/commit/e4bb802 ). Then downloaded the artifact CI used and confirmed it had that hash

ucx-1.16.0 % grep -R -l e4bb802 . 
./debian/changelog
./configure
./src/uct/api/version.h

So up to this point the commit hash matches the upstream tag


As part of the build process, we do run ./autogen.sh. This has sometimes been needed when applying upstream patches

https://github.com/conda-forge/ucx-split-feedstock/blob/590b6e0f87a75231b6e65f560de910c25b4ed89a/recipe/install_ucx.sh#L13

It appears that when running ./autogen.sh it tries to bake in some information from git:

AC_CHECK_PROG(GITBIN, git, yes)
AS_IF([test x"${GITBIN}" = x"yes"],
      [# remove preceding "refs/heads/" (11 characters) for symbolic ref
       AC_SUBST(SCM_BRANCH,  esyscmd([sh -c 'git symbolic-ref --quiet   HEAD | sed "s/^.\{11\}//"']))
       AC_SUBST(SCM_VERSION, esyscmd([sh -c 'git rev-parse    --short=7 HEAD']))],
      [AC_SUBST(SCM_BRANCH,  "<unknown>")
       AC_SUBST(SCM_VERSION, "0000000")])

Suspect what happens then is it finds the git available in our build images and then runs git from within the recipe directory in turn picking up the commit from the feedstock itself

And in fact the hash ucx_info -v provides is indeed a hash from the feedstock used to recently build ucx packages: 1ac44f2 . This is also the same result git rev-parse --short=7 HEAD would produce from that commit, which is what UCX is using to configure the version

TL;DR the build process is overwriting the included commit


So think the question now is how could we address this. Some options:

Hope that helps 🙂

SeyedMir commented 5 months ago

Thanks for the detailed response. I think dropping ./autogen.sh and using it only when needed is the simplest option.

jakirkham commented 5 months ago

Would you like to submit a PR?

leofang commented 3 months ago

Closed by #181.