Open grantcurell opened 5 months ago
can we get a little more context on this?
you cloned the project in workbench and the build failed?
or did you try and build things outside of workbench?
can we get a little more context on this?
I did as this part of the tutorial instructed.
you cloned the project in workbench and the build failed?
I cloned it using the clone project button here after I created my own fork:
The build starts but this is as far as it gets:
STEP 1/32: FROM ghcr.io/huggingface/text-generation-inference:latest
STEP 2/32: WORKDIR /opt/project/build/
Using cache 9c39d76f60b717ac3b593657d316de16beb55bfeb055e39a6106172c4ac8732a
9c39d76f60b7
STEP 3/32: SHELL ["/bin/bash", "-c"]
Using cache 7371ead4fd9fcd7d8f3df811832781a4be5820df4c1fcbe4231d8555d19aa430
7371ead4fd9f
STEP 4/32: USER root
Using cache 9e9979887cef453e1ab1466268d897eba7b1eb0f0f10a582d21c5cecf92a699e
9e9979887cef
STEP 5/32: RUN groupadd -g 1000 workbench || true
Using cache 751e8fd384b5c2630f74600076a3389314d7a13564617966a7eae26e5bd332e1
751e8fd384b5
STEP 6/32: RUN useradd -u 1000 -g 1000 -rm -d /home/workbench -s /bin/bash workbench || usermod -l workbench $(getent passwd 1000 | cut -d: -f1)
Using cache cd051d64f4101e27e5d3ced12073041d0cd4f33f3bdf2b5de42876956ab5a990
cd051d64f410
STEP 7/32: RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y sudo
Using cache 9ceb08d3bd830a76029b1fdf6b608ba0b3b1f38d35af089102814d7a7fd1705d
9ceb08d3bd83
STEP 8/32: RUN echo "workbench ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/workbench
Using cache de3ca639ee9bb62ed6793937ab0bfe2df9a98865a9f12c84eccdeb2bf1d74e28
de3ca639ee9b
STEP 9/32: ENV NVWB_UID=1000
Using cache 3412168d918a56f91825ef9259abaae936b456b45f5bc934761e5b62329641d4
3412168d918a
STEP 10/32: ENV NVWB_GID=1000
Using cache 01accd2e20b0978d676ddd46b71747da82175720d5d99437d35a4e03f2388953
01accd2e20b0
STEP 11/32: ENV NVWB_USERNAME=workbench
Using cache 7017347f865e1ea76b5c253202115200f71581d7e80c9fca286f831285e54972
7017347f865e
STEP 12/32: USER $NVWB_USERNAME
Using cache 9db5b79d8320b930ba3ad9220ccdfc9add4babff2c46093964715eed40b86384
9db5b79d8320
STEP 13/32: COPY --chown=$NVWB_UID:$NVWB_GID ["preBuild.bash", "/opt/project/build/"]
Using cache 24863b6d2189f9f2b7b76ad08873928e27ab534c89e4ca06b18c004153f245aa
24863b6d2189
STEP 14/32: RUN ["/bin/bash", "/opt/project/build/preBuild.bash"]
Using cache 0c0579d653a4a8b9bab1d3e6dd421e9950858e0fc670e90c8a6c962810baa3ef
0c0579d653a4
STEP 15/32: USER root
Using cache 07d489c5c3c1966bcc7b548cfe6e6cbfe20e6a1f3b6cb595cf3c39ba79894a2e
07d489c5c3c1
STEP 16/32: RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y wget
Using cache 6b6f7afaa6e9b8c937a559deba09288ec59b247d04cbc2bf4f2e4105372e3679
6b6f7afaa6e9
STEP 17/32: RUN dpkgArch="$(dpkg --print-architecture | awk -F- '{ print $NF }')"; wget -O- https://github.com/tianon/gosu/releases/download/1.17/gosu-${dpkgArch} | install /dev/stdin /usr/local/bin/gosu
Using cache 4a73c61d25c23be76cc67753c00fe3a5845ca7bea0f8dfeb367bc91b3e64b861
4a73c61d25c2
STEP 18/32: COPY --chmod=755 ["entrypoint.sh", "/"]
Using cache 21f0b357011345147b3c52c4f437372989b2d95db8db3de1f22aa82e36ea08c6
21f0b3570113
STEP 19/32: ENV NVWB_BASE_ENV_ENTRYPOINT=
Using cache a69cb83086ac33ec15b3e309f28ccd4dc1851eeff2b53e4b5b8fa90277197641
a69cb83086ac
STEP 20/32: USER $NVWB_USERNAME
Using cache bf604094e85375800ea71b99da88f338ac9d4f832daaa212e33153594dee490b
bf604094e853
STEP 21/32: USER root
Using cache 05f66823ac9214b3956ea40810a7e6b8459c1c6c0d5f796cf5c3ae6e83506b58
05f66823ac92
STEP 22/32: RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y libgl1
libglib2.0-0
git
jq
Hit:1 http://archive.ubuntu.com/ubuntu jammy InRelease
Hit:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64 InRelease
Get:3 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [119 kB]
Get:4 http://security.ubuntu.com/ubuntu jammy-security InRelease [110 kB]
Hit:5 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Get:6 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages [2037 kB]
Fetched 2266 kB in 1s (1823 kB/s)
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package libgl1
E: Unable to locate package libglib2.0-0
E: Couldn't find any package by glob 'libglib2.0-0
'
E: Couldn't find any package by regex 'libglib2.0-0
'
E: Unable to locate package git
Error: building at STEP "RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y libgl1
libglib2.0-0
git
jq": while running runtime: exit status 100
Build Failed
I went as far as firing up that container manually with:
apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y libgl1
libglib2.0-0
git
jq
that also fails. You get:
...
Setting up libgl1-mesa-dri:amd64 (23.2.1-1ubuntu3.1~22.04.2) ...
Setting up libglx-mesa0:amd64 (23.2.1-1ubuntu3.1~22.04.2) ...
Setting up libglx0:amd64 (1.4.0-1) ...
Setting up libgl1:amd64 (1.4.0-1) ...
Processing triggers for libc-bin (2.35-0ubuntu3.4) ...
bash: libglib2.0-0: command not found
bash: git: command not found
bash: jq: command not found
It looks ilke these are getting pulled out of the file apt.txt
. I just tried ignoring the file and changing its format from this:
# apt packages to install should be listed one per line
libgl1
libglib2.0-0
#
git
jq
to this:
# apt packages to install should be listed one per line
libgl1 libglib2.0-0 git jq
and the build has progressed past step 22. I'm guessing the odd spacing is the source of the bug.
or did you try and build things outside of workbench?
No - I did everything inside the Nvidia AI Workbench UI. No command line.
It now seems to have an unrelated failure though. Full build log attached. I choose podman when prompted to choose during install because that's what most of my customers are running with K8s. A naive glance at the error makes me think the installer assumes docker
will be present
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
/opt/project/build/postBuild.bash: line 23: $'\r': command not found
groupadd: 'docker-group
' is not a valid group name
usermod: group 'docker-group' does not exist
/opt/project/build/postBuild.bash: line 26: $'\r': command not found
chown: cannot access '/data': No such file or directory
Error: building at STEP "RUN /bin/bash /opt/project/build/postBuild.bash": while running runtime: exit status 1
It now seems to have an unrelated failure though. Full build log attached. I choose podman when prompted to choose during install because that's what most of my customers are running with K8s. A naive glance at the error makes me think the installer assumes
docker
will be presentWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv /opt/project/build/postBuild.bash: line 23: $'\r': command not found groupadd: 'docker-group ' is not a valid group name usermod: group 'docker-group' does not exist /opt/project/build/postBuild.bash: line 26: $'\r': command not found chown: cannot access '/data': No such file or directory Error: building at STEP "RUN /bin/bash /opt/project/build/postBuild.bash": while running runtime: exit status 1
If you're good with the PR I'll move this to a separate issue. I can take a look at it on Monday.
I didn't spend time looking to see if there are other dependencies assuming everything each be on its own line.
When I told Workbench to clone the project for me it pulled everything with CRLF which caused the build to choke.
I removed all the carriage returns recursively with:
#!/bin/bash
# Function to remove carriage return characters from a file
remove_carriage_returns() {
sed -i 's/\r$//' "$1"
echo "Removed carriage return characters from $1"
}
# Navigate to the folder containing the files
cd /path/to/your/folder
# Find all files recursively in the folder and its subdirectories
find . -type f | while IFS= read -r file; do
# Remove carriage return characters from each file
remove_carriage_returns "$file"
done
Not sure if this is necessarily a problem with this project, but if you leave any of them either the build fails or the chat app itself fails.
I have just started looking at this, but if this is meant to be an out of the box demo sort of deal, it looks like the build is broken. I just did a fresh install, let workbench deploy podman for me, and forked this repo. Below are the results.
System Info
Not that it really much matters since the build appears to be failing looking for some packages that don't exist in a container, but the host system is: