Open UserName-wang opened 1 year ago
@UserName-wang hmm, can you try doing a sudo docker pull nvcr.io/nvidia/l4t-jetpack:r35.3.1
first?
@UserName-wang hmm, can you try doing a
sudo docker pull nvcr.io/nvidia/l4t-jetpack:r35.3.1
first? it woks now! Thank you a lot for your help and quick reply!
Dear @dusty-nv,
Now I have a new error about bitsandbytes:
make CUDA_VERSION=DETECTED_CUDA_VERSION
for example, make CUDA_VERSION=113
.conda list | grep cuda
.-- Testing container my_container:r35.3.1-bitsandbytes (bitsandbytes/test.py)
sudo docker run -t --rm --runtime=nvidia --network=host \ --volume /home/agx/study/nvidia/jetson-containers/packages/llm/bitsandbytes:/test \ --volume /home/agx/study/nvidia/jetson-containers/data:/data \ --workdir /test \ my_container:r35.3.1-bitsandbytes \ /bin/bash -c 'python3 test.py' \ 2>&1 | tee /home/agx/study/nvidia/jetson-containers/logs/20230825_174248/test/my_container_r35.3.1-bitsandbytes_test.py.txt; exit ${PIPESTATUS[0]}
testing bitsandbytes...
===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
bin /usr/local/lib/python3.8/dist-packages/bitsandbytes/libbitsandbytes_cuda114_nocublaslt.so False CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0 CUDA SETUP: Highest compute capability among GPUs detected: 7.2 CUDA SETUP: Detected CUDA version 114 /usr/local/lib/python3.8/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU! warn(msg) CUDA SETUP: Required library version not found: libbitsandbytes_cuda114_nocublaslt.so. Maybe you need to compile it from source? CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...
================================================ERROR===================================== CUDA SETUP: CUDA detection failed! Possible reasons:
make CUDA_VERSION=DETECTED_CUDA_VERSION
for example, make CUDA_VERSION=113
.
CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda
.@UserName-wang can you post the command you ran to start the build, and the entire build log?
For some reason, it is not detecting CUDA Toolkit...
Dear @dusty-nv , command I used: ./build.sh --name=my_container pytorch transformers nemo
#1 DONE 0.0s
#2 [internal] load build definition from Dockerfile
#2 transferring dockerfile: 512B done
#2 DONE 0.0s
#3 [internal] load metadata for docker.io/library/my_container:r35.3.1-cmake
#3 DONE 0.0s
#4 [1/3] FROM docker.io/library/my_container:r35.3.1-cmake
#4 DONE 0.0s
#5 [2/3] RUN pip3 install --no-cache-dir --verbose git+https://github.com/onnx/onnx@main
#5 CACHED
#6 [3/3] RUN pip3 show onnx && python3 -c 'import onnx; print(onnx.__version__)'
#6 CACHED
#7 exporting to image
#7 exporting layers done
#7 writing image sha256:e0150313e6a46d8ebcf12e3df6b664c7fd0a3f4b5f2b4b0c55d39e1c37e058c9 done
#7 naming to docker.io/library/my_container:r35.3.1-onnx done
#7 DONE 0.0s
-- Testing container my_container:r35.3.1-onnx (onnx/test.py)
sudo docker run -t --rm --runtime=nvidia --network=host \
--volume /home/agx/study/nvidia/jetson-containers/packages/onnx:/test \
--volume /home/agx/study/nvidia/jetson-containers/data:/data \
--workdir /test \
my_container:r35.3.1-onnx \
/bin/bash -c 'python3 test.py' \
2>&1 | tee /home/agx/study/nvidia/jetson-containers/logs/20230825_221528/test/my_container_r35.3.1-onnx_test.py.txt; exit ${PIPESTATUS[0]}
testing onnx...
onnx version: 1.15.0
onnx OK
-- Building container my_container:r35.3.1-pytorch
sudo docker build --network=host --tag my_container:r35.3.1-pytorch \
--file /home/agx/study/nvidia/jetson-containers/packages/pytorch/Dockerfile \
--build-arg BASE_IMAGE=my_container:r35.3.1-onnx \
--build-arg PYTORCH_WHL="torch-2.0.0+nv23.05-cp38-cp38-linux_aarch64.whl" \
--build-arg PYTORCH_URL="https://nvidia.box.com/shared/static/i8pukc49h3lhak4kkn67tg9j4goqm0m7.whl" \
/home/agx/study/nvidia/jetson-containers/packages/pytorch \
2>&1 | tee /home/agx/study/nvidia/jetson-containers/logs/20230825_221528/build/my_container_r35.3.1-pytorch.txt; exit ${PIPESTATUS[0]}
#0 building with "default" instance using docker driver
#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 1.87kB 0.0s done
#1 DONE 0.0s
#2 [internal] load .dockerignore
#2 transferring context: 2B 0.0s done
#2 DONE 0.0s
#3 [internal] load metadata for docker.io/library/my_container:r35.3.1-onnx
#3 DONE 0.0s
#4 [1/6] FROM docker.io/library/my_container:r35.3.1-onnx
#4 DONE 0.0s
#5 [2/6] RUN apt-get update && apt-get install -y --no-install-recommends libopenblas-dev libopenmpi-dev openmpi-bin openmpi-common gfortran libomp-dev && rm -rf /var/lib/apt/lists/* && apt-get clean
#5 CACHED
#6 [5/6] RUN PYTHON_ROOT=`pip3 show torch | grep Location: | cut -d' ' -f2` && TORCH_CMAKE_CONFIG=$PYTHON_ROOT/torch/share/cmake/Torch/TorchConfig.cmake && echo "patching _GLIBCXX_USE_CXX11_ABI in ${TORCH_CMAKE_CONFIG}" && sed -i 's/ set(TORCH_CXX_FLAGS "-D_GLIBCXX_USE_CXX11_ABI=")/ set(TORCH_CXX_FLAGS "-D_GLIBCXX_USE_CXX11_ABI=0")/g' ${TORCH_CMAKE_CONFIG}
#6 CACHED
#7 [4/6] RUN python3 -c 'import torch; print(f"PyTorch version: {torch.__version__}"); print(f"CUDA available: {torch.cuda.is_available()}"); print(f"cuDNN version: {torch.backends.cudnn.version()}"); print(torch.__config__.show());'
#7 CACHED
#8 [3/6] RUN cd /opt && wget --quiet --show-progress --progress=bar:force:noscroll --no-check-certificate https://nvidia.box.com/shared/static/i8pukc49h3lhak4kkn67tg9j4goqm0m7.whl -O torch-2.0.0+nv23.05-cp38-cp38-linux_aarch64.whl && pip3 install --verbose torch-2.0.0+nv23.05-cp38-cp38-linux_aarch64.whl
#8 CACHED
#9 [6/6] RUN pip3 install --no-cache-dir scikit-build && pip3 install --no-cache-dir ninja
#9 CACHED
#10 exporting to image
#10 exporting layers done
#10 writing image sha256:dd39d49070b8d8464ce9af0a2aaf4295b0ed03c396d57fcf6842ada673488db9 done
#10 naming to docker.io/library/my_container:r35.3.1-pytorch done
#10 DONE 0.0s
-- Testing container my_container:r35.3.1-pytorch (pytorch:2.0/test.py)
sudo docker run -t --rm --runtime=nvidia --network=host \
--volume /home/agx/study/nvidia/jetson-containers/packages/pytorch:/test \
--volume /home/agx/study/nvidia/jetson-containers/data:/data \
--workdir /test \
my_container:r35.3.1-pytorch \
/bin/bash -c 'python3 test.py' \
2>&1 | tee /home/agx/study/nvidia/jetson-containers/logs/20230825_221528/test/my_container_r35.3.1-pytorch_test.py.txt; exit ${PIPESTATUS[0]}
testing PyTorch...
PyTorch version: 2.0.0+nv23.05
CUDA available: True
cuDNN version: 8600
PyTorch built with:
- GCC 9.4
- C++ Version: 201703
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: NO AVX
- CUDA Runtime 11.4
- NVCC architecture flags: -gencode;arch=compute_72,code=sm_72;-gencode;arch=compute_87,code=sm_87
- CuDNN 8.6
- Build settings: BLAS_INFO=open, BUILD_TYPE=Release, CUDA_VERSION=11.4, CUDNN_VERSION=8.6.0, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=1 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=open, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=0, USE_NNPACK=1, USE_OPENMP=ON, USE_ROCM=OFF,
2.0.0+nv23.5
Tensor a = tensor([0., 0.], device='cuda:0')
Tensor b = tensor([-0.9102, 0.5022], device='cuda:0')
Tensor c = tensor([-0.9102, 0.5022], device='cuda:0')
testing LAPACK (OpenBLAS)...
done testing LAPACK (OpenBLAS)
testing torch.nn (cuDNN)...
done testing torch.nn (cuDNN)
testing CPU tensor vector operations...
test.py:58: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
cpu_y = F.softmax(cpu_x)
Tensor cpu_x = tensor([12.3450])
Tensor softmax = tensor([1.])
Tensor exp (float32) = tensor([[2.7183, 2.7183, 2.7183],
[2.7183, 2.7183, 2.7183],
[2.7183, 2.7183, 2.7183]])
Tensor exp (float64) = tensor([[2.7183, 2.7183, 2.7183],
[2.7183, 2.7183, 2.7183],
[2.7183, 2.7183, 2.7183]], dtype=torch.float64)
Tensor exp (diff) = 7.429356050359104e-07
PyTorch OK
-- Building container my_container:r35.3.1-torchvision
sudo docker build --network=host --tag my_container:r35.3.1-torchvision \
--file /home/agx/study/nvidia/jetson-containers/packages/pytorch/torchvision/Dockerfile \
--build-arg BASE_IMAGE=my_container:r35.3.1-pytorch \
--build-arg TORCHVISION_VERSION="v0.15.1" \
--build-arg TORCH_CUDA_ARCH_LIST="7.2;8.7" \
/home/agx/study/nvidia/jetson-containers/packages/pytorch/torchvision \
2>&1 | tee /home/agx/study/nvidia/jetson-containers/logs/20230825_221528/build/my_container_r35.3.1-torchvision.txt; exit ${PIPESTATUS[0]}
#0 building with "default" instance using docker driver
#1 [internal] load .dockerignore
#1 transferring context: 2B done
#1 DONE 0.0s
#2 [internal] load build definition from Dockerfile
#2 transferring dockerfile: 1.23kB done
#2 DONE 0.0s
#3 [internal] load metadata for docker.io/library/my_container:r35.3.1-pytorch
#3 DONE 0.0s
#4 [1/5] FROM docker.io/library/my_container:r35.3.1-pytorch
#4 DONE 0.0s
#5 [4/5] RUN git clone --branch v0.15.1 --recursive --depth=1 https://github.com/pytorch/vision torchvision && cd torchvision && git checkout v0.15.1 && python3 setup.py bdist_wheel && cp dist/torchvision*.whl /opt && pip3 install --no-cache-dir --verbose /opt/torchvision*.whl && cd ../ && rm -rf torchvision
#5 CACHED
#6 [2/5] RUN printenv && echo "torchvision version = v0.15.1" && echo "TORCH_CUDA_ARCH_LIST = 7.2;8.7"
#6 CACHED
#7 [3/5] RUN apt-get update && apt-get install -y --no-install-recommends libjpeg-dev zlib1g-dev && rm -rf /var/lib/apt/lists/* && apt-get clean
#7 CACHED
#8 [5/5] RUN python3 -c 'import torchvision; print(torchvision.__version__);'
#8 CACHED
#9 exporting to image
#9 exporting layers done
#9 writing image sha256:058294389642a20461b808775840f30cfd19d1a01afab061a464390fb6014732 0.0s done
#9 naming to docker.io/library/my_container:r35.3.1-torchvision done
#9 DONE 0.0s
-- Testing container my_container:r35.3.1-torchvision (torchvision/test.py)
sudo docker run -t --rm --runtime=nvidia --network=host \
--volume /home/agx/study/nvidia/jetson-containers/packages/pytorch/torchvision:/test \
--volume /home/agx/study/nvidia/jetson-containers/data:/data \
--workdir /test \
my_container:r35.3.1-torchvision \
/bin/bash -c 'python3 test.py' \
2>&1 | tee /home/agx/study/nvidia/jetson-containers/logs/20230825_221528/test/my_container_r35.3.1-torchvision_test.py.txt; exit ${PIPESTATUS[0]}
testing torchvision...
torchvision version: 0.15.1a0+42759b1
testing torchvision extensions...
torchvision classification models: alexnet | convnext_base | convnext_large | convnext_small | convnext_tiny | densenet121 | densenet161 | densenet169 | densenet201 | efficientnet_b0 | efficientnet_b1 | efficientnet_b2 | efficientnet_b3 | efficientnet_b4 | efficientnet_b5 | efficientnet_b6 | efficientnet_b7 | efficientnet_v2_l | efficientnet_v2_m | efficientnet_v2_s | get_model | get_model_builder | get_model_weights | get_weight | googlenet | inception_v3 | list_models | maxvit_t | mnasnet0_5 | mnasnet0_75 | mnasnet1_0 | mnasnet1_3 | mobilenet_v2 | mobilenet_v3_large | mobilenet_v3_small | regnet_x_16gf | regnet_x_1_6gf | regnet_x_32gf | regnet_x_3_2gf | regnet_x_400mf | regnet_x_800mf | regnet_x_8gf | regnet_y_128gf | regnet_y_16gf | regnet_y_1_6gf | regnet_y_32gf | regnet_y_3_2gf | regnet_y_400mf | regnet_y_800mf | regnet_y_8gf | resnet101 | resnet152 | resnet18 | resnet34 | resnet50 | resnext101_32x8d | resnext101_64x4d | resnext50_32x4d | shufflenet_v2_x0_5 | shufflenet_v2_x1_0 | shufflenet_v2_x1_5 | shufflenet_v2_x2_0 | squeezenet1_0 | squeezenet1_1 | swin_b | swin_s | swin_t | swin_v2_b | swin_v2_s | swin_v2_t | vgg11 | vgg11_bn | vgg13 | vgg13_bn | vgg16 | vgg16_bn | vgg19 | vgg19_bn | vit_b_16 | vit_b_32 | vit_h_14 | vit_l_16 | vit_l_32 | wide_resnet101_2 | wide_resnet50_2
Namespace(batch_size=8, data_tar='ILSVRC2012_img_val_subset_5k.tar.gz', data_url='https://nvidia.box.com/shared/static/y1ygiahv8h75yiyh0pt50jqdqt7pohgx.gz', models=['resnet18'], print_freq=25, resolution=224, test_threshold=-10.0, use_cuda=True, workers=2)
using CUDA
dataset classes: 1000
dataset images: 5000
batch size: 8
---------------------------------------------
-- resnet18
---------------------------------------------
loading model 'resnet18'
/usr/local/lib/python3.8/dist-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/usr/local/lib/python3.8/dist-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet18_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet18_Weights.DEFAULT` to get the most up-to-date weights.
warnings.warn(msg)
loaded model 'resnet18'
resnet18 [ 0/625] Time 3.409 ( 3.409) Acc@1 75.00 ( 75.00) Acc@5 100.00 (100.00)
resnet18 [ 25/625] Time 0.013 ( 0.175) Acc@1 62.50 ( 79.81) Acc@5 87.50 ( 94.71)
resnet18 [ 50/625] Time 0.088 ( 0.115) Acc@1 100.00 ( 73.53) Acc@5 100.00 ( 91.91)
resnet18 [ 75/625] Time 0.044 ( 0.093) Acc@1 62.50 ( 76.48) Acc@5 87.50 ( 92.27)
resnet18 [100/625] Time 0.014 ( 0.082) Acc@1 87.50 ( 78.22) Acc@5 100.00 ( 93.07)
resnet18 [125/625] Time 0.079 ( 0.075) Acc@1 87.50 ( 77.38) Acc@5 100.00 ( 93.35)
resnet18 [150/625] Time 0.020 ( 0.071) Acc@1 12.50 ( 76.49) Acc@5 100.00 ( 93.38)
resnet18 [175/625] Time 0.084 ( 0.068) Acc@1 62.50 ( 76.42) Acc@5 100.00 ( 93.75)
resnet18 [200/625] Time 0.015 ( 0.066) Acc@1 100.00 ( 76.55) Acc@5 100.00 ( 93.72)
resnet18 [225/625] Time 0.095 ( 0.065) Acc@1 87.50 ( 76.77) Acc@5 100.00 ( 93.58)
resnet18 [250/625] Time 0.081 ( 0.064) Acc@1 37.50 ( 76.64) Acc@5 62.50 ( 93.63)
resnet18 [275/625] Time 0.023 ( 0.062) Acc@1 87.50 ( 75.72) Acc@5 100.00 ( 93.25)
resnet18 [300/625] Time 0.082 ( 0.062) Acc@1 50.00 ( 74.38) Acc@5 87.50 ( 92.07)
resnet18 [325/625] Time 0.012 ( 0.060) Acc@1 75.00 ( 73.27) Acc@5 100.00 ( 91.53)
resnet18 [350/625] Time 0.070 ( 0.060) Acc@1 62.50 ( 72.69) Acc@5 87.50 ( 91.10)
resnet18 [375/625] Time 0.021 ( 0.059) Acc@1 25.00 ( 72.64) Acc@5 62.50 ( 90.89)
resnet18 [400/625] Time 0.076 ( 0.059) Acc@1 75.00 ( 71.95) Acc@5 87.50 ( 90.27)
resnet18 [425/625] Time 0.013 ( 0.058) Acc@1 62.50 ( 71.33) Acc@5 87.50 ( 89.91)
resnet18 [450/625] Time 0.111 ( 0.057) Acc@1 75.00 ( 71.26) Acc@5 87.50 ( 89.99)
resnet18 [475/625] Time 0.044 ( 0.057) Acc@1 37.50 ( 70.75) Acc@5 75.00 ( 89.63)
resnet18 [500/625] Time 0.057 ( 0.056) Acc@1 75.00 ( 70.36) Acc@5 87.50 ( 89.25)
resnet18 [525/625] Time 0.018 ( 0.056) Acc@1 62.50 ( 69.96) Acc@5 87.50 ( 89.02)
resnet18 [550/625] Time 0.065 ( 0.056) Acc@1 100.00 ( 69.69) Acc@5 100.00 ( 88.77)
resnet18 [575/625] Time 0.016 ( 0.056) Acc@1 75.00 ( 69.34) Acc@5 87.50 ( 88.54)
resnet18 [600/625] Time 0.016 ( 0.055) Acc@1 37.50 ( 69.70) Acc@5 87.50 ( 88.71)
resnet18
* Acc@1 69.760 Expected 69.760 Delta -0.000
* Acc@5 88.760 Expected 89.080 Delta -0.320
* Images/sec 145.023
* PASS
---------------------------------------------
-- Summary
---------------------------------------------
resnet18
* Acc@1 69.760 Expected 69.760 Delta -0.000
* Acc@5 88.760 Expected 89.080 Delta -0.320
* Images/sec 145.023
* PASS
Model tests passing: 1 / 1
torchvision OK
-- Building container my_container:r35.3.1-huggingface_hub
sudo docker build --network=host --tag my_container:r35.3.1-huggingface_hub \
--file /home/agx/study/nvidia/jetson-containers/packages/llm/huggingface_hub/Dockerfile \
--build-arg BASE_IMAGE=my_container:r35.3.1-torchvision \
/home/agx/study/nvidia/jetson-containers/packages/llm/huggingface_hub \
2>&1 | tee /home/agx/study/nvidia/jetson-containers/logs/20230825_221528/build/my_container_r35.3.1-huggingface_hub.txt; exit ${PIPESTATUS[0]}
#0 building with "default" instance using docker driver
#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 877B 0.0s done
#1 DONE 0.0s
#2 [internal] load .dockerignore
#2 transferring context: 2B 0.0s done
#2 DONE 0.0s
#3 [internal] load metadata for docker.io/library/my_container:r35.3.1-torchvision
#3 DONE 0.0s
#4 [1/6] FROM docker.io/library/my_container:r35.3.1-torchvision
#4 DONE 0.0s
#5 [internal] load build context
#5 transferring context: 89B done
#5 DONE 0.0s
#6 [3/6] RUN pip3 install --no-cache-dir --verbose dataclasses
#6 CACHED
#7 [5/6] COPY huggingface-downloader.py /usr/local/bin/_huggingface-downloader.py
#7 CACHED
#8 [2/6] RUN pip3 install --no-cache-dir --verbose huggingface_hub[cli]
#8 CACHED
#9 [4/6] COPY huggingface-downloader /usr/local/bin/
#9 CACHED
#10 [6/6] RUN huggingface-cli --help && huggingface-downloader --help && pip3 show huggingface_hub && python3 -c 'import huggingface_hub; print(huggingface_hub.__version__)'
#10 CACHED
#11 exporting to image
#11 exporting layers done
#11 writing image sha256:b09017a64d7f6b45dc781a242c45ea01ed6c51e668173a8abe65ecd1872a47a8 done
#11 naming to docker.io/library/my_container:r35.3.1-huggingface_hub done
#11 DONE 0.0s
-- Testing container my_container:r35.3.1-huggingface_hub (huggingface_hub/test.py)
sudo docker run -t --rm --runtime=nvidia --network=host \
--volume /home/agx/study/nvidia/jetson-containers/packages/llm/huggingface_hub:/test \
--volume /home/agx/study/nvidia/jetson-containers/data:/data \
--workdir /test \
my_container:r35.3.1-huggingface_hub \
/bin/bash -c 'python3 test.py' \
2>&1 | tee /home/agx/study/nvidia/jetson-containers/logs/20230825_221528/test/my_container_r35.3.1-huggingface_hub_test.py.txt; exit ${PIPESTATUS[0]}
testing huggingface_hub...
huggingface_hub version: 0.16.4
huggingface_hub OK
-- Building container my_container:r35.3.1-rust
sudo docker build --network=host --tag my_container:r35.3.1-rust \
--file /home/agx/study/nvidia/jetson-containers/packages/rust/Dockerfile \
--build-arg BASE_IMAGE=my_container:r35.3.1-huggingface_hub \
/home/agx/study/nvidia/jetson-containers/packages/rust \
2>&1 | tee /home/agx/study/nvidia/jetson-containers/logs/20230825_221528/build/my_container_r35.3.1-rust.txt; exit ${PIPESTATUS[0]}
#0 building with "default" instance using docker driver
#1 [internal] load .dockerignore
#1 transferring context: 0.0s
#1 transferring context: 2B 0.0s done
#1 DONE 0.0s
#2 [internal] load build definition from Dockerfile
#2 transferring dockerfile: 317B 0.0s done
#2 DONE 0.1s
#3 [internal] load metadata for docker.io/library/my_container:r35.3.1-huggingface_hub
#3 DONE 0.0s
#4 [1/3] FROM docker.io/library/my_container:r35.3.1-huggingface_hub
#4 DONE 0.0s
#5 [2/3] RUN curl https://sh.rustup.rs -sSf | sh -s -- -y
#5 CACHED
#6 [3/3] RUN rustc --version && pip3 install --no-cache-dir --verbose setuptools-rust
#6 CACHED
#7 exporting to image
#7 exporting layers done
#7 writing image sha256:5d8a1fe7bca2ac266e769dc10550470eb16cfecc240f5a13fbc86e1caf2922ad 0.0s done
#7 naming to docker.io/library/my_container:r35.3.1-rust 0.0s done
#7 DONE 0.0s
-- Building container my_container:r35.3.1-bitsandbytes
sudo docker build --network=host --tag my_container:r35.3.1-bitsandbytes \
--file /home/agx/study/nvidia/jetson-containers/packages/llm/bitsandbytes/Dockerfile \
--build-arg BASE_IMAGE=my_container:r35.3.1-rust \
/home/agx/study/nvidia/jetson-containers/packages/llm/bitsandbytes \
2>&1 | tee /home/agx/study/nvidia/jetson-containers/logs/20230825_221528/build/my_container_r35.3.1-bitsandbytes.txt; exit ${PIPESTATUS[0]}
#0 building with "default" instance using docker driver
#1 [internal] load .dockerignore
#1 transferring context: 2B done
#1 DONE 0.0s
#2 [internal] load build definition from Dockerfile
#2 transferring dockerfile: 1.21kB done
#2 DONE 0.0s
#3 [internal] load metadata for docker.io/library/my_container:r35.3.1-rust
#3 DONE 0.0s
#4 [1/5] FROM docker.io/library/my_container:r35.3.1-rust
#4 DONE 0.0s
#5 https://api.github.com/repos/dusty-nv/bitsandbytes/git/refs/heads/main
#5 DONE 0.5s
#6 [2/5] ADD https://api.github.com/repos/dusty-nv/bitsandbytes/git/refs/heads/main /tmp/bitsandbytes_version.json
#6 CACHED
#7 [3/5] RUN pip3 uninstall -y bitsandbytes && cd /opt && git clone --depth=1 https://github.com/dusty-nv/bitsandbytes bitsandbytes && cd bitsandbytes && make CUDA_VERSION=114 -j$(nproc) cuda11x && python3 setup.py --verbose build_ext --inplace -j$(nproc) bdist_wheel && cp dist/bitsandbytes*.whl /opt && pip3 install --no-cache-dir --verbose /opt/bitsandbytes*.whl && cd ../ && rm -rf bitsandbytes
#7 CACHED
#8 [4/5] RUN pip3 install --no-cache-dir --verbose scipy
#8 CACHED
#9 [5/5] RUN pip3 show bitsandbytes && python3 -c 'import bitsandbytes'
#9 CACHED
#10 exporting to image
#10 exporting layers done
#10 writing image sha256:d590ef0eef5cf993ac8bf5d21f58a8e2c54ae40df4f53309e9b6eb69c7986638 done
#10 naming to docker.io/library/my_container:r35.3.1-bitsandbytes done
#10 DONE 0.0s
-- Testing container my_container:r35.3.1-bitsandbytes (bitsandbytes/test.py)
sudo docker run -t --rm --runtime=nvidia --network=host \
--volume /home/agx/study/nvidia/jetson-containers/packages/llm/bitsandbytes:/test \
--volume /home/agx/study/nvidia/jetson-containers/data:/data \
--workdir /test \
my_container:r35.3.1-bitsandbytes \
/bin/bash -c 'python3 test.py' \
2>&1 | tee /home/agx/study/nvidia/jetson-containers/logs/20230825_221528/test/my_container_r35.3.1-bitsandbytes_test.py.txt; exit ${PIPESTATUS[0]}
testing bitsandbytes...
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /usr/local/lib/python3.8/dist-packages/bitsandbytes/libbitsandbytes_cuda114_nocublaslt.so
False
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 7.2
CUDA SETUP: Detected CUDA version 114
/usr/local/lib/python3.8/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
warn(msg)
CUDA SETUP: Required library version not found: libbitsandbytes_cuda114_nocublaslt.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...
================================================ERROR=====================================
CUDA SETUP: CUDA detection failed! Possible reasons:
1. CUDA driver not installed
2. CUDA not installed
3. You have multiple conflicting CUDA libraries
4. Required library not pre-compiled for this bitsandbytes release!
CUDA SETUP: If you compiled from source, try again with `make CUDA_VERSION=DETECTED_CUDA_VERSION` for example, `make CUDA_VERSION=113`.
CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via `conda list | grep cuda`.
================================================================================
CUDA SETUP: Something unexpected happened. Please compile from source:
git clone git@github.com:TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=114 make cuda11x_nomatmul
python setup.py install
CUDA SETUP: Setup Failed!
Traceback (most recent call last):
File "test.py", line 4, in <module>
import bitsandbytes
File "/usr/local/lib/python3.8/dist-packages/bitsandbytes/__init__.py", line 6, in <module>
from . import cuda_setup, utils, research
File "/usr/local/lib/python3.8/dist-packages/bitsandbytes/research/__init__.py", line 1, in <module>
from . import nn
File "/usr/local/lib/python3.8/dist-packages/bitsandbytes/research/nn/__init__.py", line 1, in <module>
from .modules import LinearFP8Mixed, LinearFP8Global
File "/usr/local/lib/python3.8/dist-packages/bitsandbytes/research/nn/modules.py", line 8, in <module>
from bitsandbytes.optim import GlobalOptimManager
File "/usr/local/lib/python3.8/dist-packages/bitsandbytes/optim/__init__.py", line 6, in <module>
from bitsandbytes.cextension import COMPILED_WITH_CUDA
File "/usr/local/lib/python3.8/dist-packages/bitsandbytes/cextension.py", line 20, in <module>
raise RuntimeError('''
RuntimeError:
CUDA Setup failed despite GPU being available. Please run the following command to get more information:
python -m bitsandbytes
Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/agx/study/nvidia/jetson-containers/jetson_containers/build.py", line 93, in <module>
build_container(args.name, args.packages, args.base, args.build_flags, args.simulate, args.skip_tests, args.push)
File "/home/agx/study/nvidia/jetson-containers/jetson_containers/container.py", line 125, in build_container
test_container(container_name, pkg, simulate)
File "/home/agx/study/nvidia/jetson-containers/jetson_containers/container.py", line 295, in test_container
status = subprocess.run(cmd.replace(_NEWLINE_, ' '), executable='/bin/bash', shell=True, check=True)
File "/usr/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'sudo docker run -t --rm --runtime=nvidia --network=host --volume /home/agx/study/nvidia/jetson-containers/packages/llm/bitsandbytes:/test --volume /home/agx/study/nvidia/jetson-containers/data:/data --workdir /test my_container:r35.3.1-bitsandbytes /bin/bash -c 'python3 test.py' 2>&1 | tee /home/agx/study/nvidia/jetson-containers/logs/20230825_221528/test/my_container_r35.3.1-bitsandbytes_test.py.txt; exit ${PIPESTATUS[0]}' returned non-zero exit status 1.
@UserName-wang unfortunately that log doesn't capture the actual build, because bitsandbytes was already built and it's just showing the cached output. My guess is that it didn't find CUDA or something for some reason. You could try ./build.sh bitsandbytes
to try building/testing just bitsandbytes (paste the log from that here). Or you can run build.sh with --build-flags='--no-cache'
Also, your docker output looks different than mine - are you using buildkit or something? This is what my sudo docker version
shows:
sudo docker version
Client:
Version: 20.10.21
API version: 1.41
Go version: go1.18.1
Git commit: 20.10.21-0ubuntu1~20.04.2
Built: Thu Apr 27 05:56:44 2023
OS/Arch: linux/arm64
Context: default
Experimental: true
Server:
Engine:
Version: 20.10.21
API version: 1.41 (minimum version 1.12)
Go version: go1.18.1
Git commit: 20.10.21-0ubuntu1~20.04.2
Built: Thu Apr 27 05:37:01 2023
OS/Arch: linux/arm64
Experimental: false
containerd:
Version: 1.6.12-0ubuntu1~20.04.3
GitCommit:
nvidia:
Version: 1.1.4-0ubuntu1~20.04.3
GitCommit: 629a689
docker-init:
Version: 0.19.0
GitCommit:
@dusty-nv , below are my docker information, I guess it's automatically upgraded from your version. I tried commands: ./build.sh bitsandbytes and ./build.sh --build-flags='--no-cache' , almost the same error. but I cannot upload all of the log information. because the log content are too long.
Client: Docker Engine - Community Version: 24.0.5 API version: 1.43 Go version: go1.20.6 Git commit: ced0996 Built: Fri Jul 21 20:35:47 2023 OS/Arch: linux/arm64 Context: default
Server: Docker Engine - Community Engine: Version: 24.0.5 API version: 1.43 (minimum version 1.12) Go version: go1.20.6 Git commit: a61e2b4 Built: Fri Jul 21 20:35:47 2023 OS/Arch: linux/arm64 Experimental: false containerd: Version: 1.6.22 GitCommit: 8165feabfdfe38c65b599c4993d227328c231fca nvidia: Version: 1.1.8 GitCommit: v1.1.8-0-g82f18fe docker-init: Version: 0.19.0 GitCommit: de40ad0
error after command: ./build.sh bitsandbytes --build-flags='--no-cache'
make CUDA_VERSION=DETECTED_CUDA_VERSION
for example, make CUDA_VERSION=113
.conda list | grep cuda
.-- Testing container bitsandbytes:r35.3.1-bitsandbytes (bitsandbytes/test.py)
sudo docker run -t --rm --runtime=nvidia --network=host \ --volume /home/agx/study/nvidia/jetson-containers/packages/llm/bitsandbytes:/test \ --volume /home/agx/study/nvidia/jetson-containers/data:/data \ --workdir /test \ bitsandbytes:r35.3.1-bitsandbytes \ /bin/bash -c 'python3 test.py' \ 2>&1 | tee /home/agx/study/nvidia/jetson-containers/logs/20230826_141319/test/bitsandbytes_r35.3.1-bitsandbytes_test.py.txt; exit ${PIPESTATUS[0]}
testing bitsandbytes...
===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
bin /usr/local/lib/python3.8/dist-packages/bitsandbytes/libbitsandbytes_cuda114_nocublaslt.so False CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0 CUDA SETUP: Highest compute capability among GPUs detected: 7.2 CUDA SETUP: Detected CUDA version 114 /usr/local/lib/python3.8/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU! warn(msg) CUDA SETUP: Required library version not found: libbitsandbytes_cuda114_nocublaslt.so. Maybe you need to compile it from source? CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...
================================================ERROR===================================== CUDA SETUP: CUDA detection failed! Possible reasons:
make CUDA_VERSION=DETECTED_CUDA_VERSION
for example, make CUDA_VERSION=113
.
CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda
.CUDA SETUP: Something unexpected happened. Please compile from source:
git clone git@github.com:TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=114 make cuda11x_nomatmul
python setup.py install
CUDA SETUP: Setup Failed!
Traceback (most recent call last):
File "test.py", line 4, in
python -m bitsandbytes
Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/agx/study/nvidia/jetson-containers/jetson_containers/build.py", line 93, in
@UserName-wang run ./build.sh bitsandbytes --build-flags='--no-cache' | tee logs/bitsandbytes.txt
and then attach the log file here. I can't tell why it doesn't build with CUDA without the full log.
Have you set your default docker-runtime to nvidia
like here? https://github.com/dusty-nv/jetson-containers/blob/master/docs/setup.md#docker-default-runtime
Dear @dusty-nv , thank you for your patience! Here is my /etc/docker/daemon.json: { "runtimes": { "nvidia": { "path": "nvidia-container-runtime", "runtimeArgs": [] } }, "default-runtime": "nvidia", "insecure-registries": ["nas:5555"] }
and here is the error log file after command: ./build.sh bitsandbytes --build-flags='--no-cache' | tee logs/bitsandbytes.txt
Hi there,
my environment is: Jetson AGX xavier running Jetpack 5.1.1, and L4T version R35.3.1,
command: ./build.sh --name=my_container ros:humble-desktop /usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.16) or chardet (3.0.4) doesn't match a supported version! warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported " Namespace(base='', build_flags='', list_packages=False, logs='', multiple=False, name='my_container', package_dirs=[''], packages=['ros:humble-desktop'], push='', show_packages=False, simulate=False, skip_errors=False, skip_packages=[''], skip_tests=['']) -- L4T_VERSION=35.3.1 -- JETPACK_VERSION=5.1.1 -- CUDA_VERSION=11.4.315 -- LSB_RELEASE=20.04 (focal) -- Loading /home/agx/study/nvidia/jetson-containers/packages/protobuf/protobuf_cpp/config.py -- Loading /home/agx/study/nvidia/jetson-containers/packages/tensorflow/config.py -- Package small-stable-diffusion was disabled by its config -- Loading /home/agx/study/nvidia/jetson-containers/packages/l4t/l4t-text-generation/config.yml -- Loading /home/agx/study/nvidia/jetson-containers/packages/l4t/l4t-pytorch/l4t-pytorch.json -- Loading /home/agx/study/nvidia/jetson-containers/packages/l4t/l4t-tensorflow/l4t-tensorflow.json -- Loading /home/agx/study/nvidia/jetson-containers/packages/l4t/l4t-diffusion/config.yml -- Loading /home/agx/study/nvidia/jetson-containers/packages/tritonserver/config.py -- Loading /home/agx/study/nvidia/jetson-containers/packages/pytorch/torchaudio/config.py -- Loading /home/agx/study/nvidia/jetson-containers/packages/pytorch/torchvision/config.py -- Loading /home/agx/study/nvidia/jetson-containers/packages/pytorch/torch2trt/config.py -- Loading /home/agx/study/nvidia/jetson-containers/packages/pytorch/torch_tensorrt/config.py -- Loading /home/agx/study/nvidia/jetson-containers/packages/pytorch/config.py -- Package pytorch:1.10 isn't compatible with L4T r35.3.1 (requires L4T ==32.) -- Package pytorch:1.10 was disabled by its config -- Package pytorch:1.9 isn't compatible with L4T r35.3.1 (requires L4T ==32.) -- Package pytorch:1.9 was disabled by its config -- Loading /home/agx/study/nvidia/jetson-containers/packages/riva-client/config.json -- Loading /home/agx/study/nvidia/jetson-containers/packages/deepstream/config.py -- Loading /home/agx/study/nvidia/jetson-containers/packages/rapids/cuml/config.py -- Loading /home/agx/study/nvidia/jetson-containers/packages/rapids/cudf/config.py -- Loading /home/agx/study/nvidia/jetson-containers/packages/nemo/config.py -- Loading /home/agx/study/nvidia/jetson-containers/packages/zed/config.py -- Loading /home/agx/study/nvidia/jetson-containers/packages/opencv/opencv_builder/config.py -- Loading /home/agx/study/nvidia/jetson-containers/packages/opencv/config.py -- Loading /home/agx/study/nvidia/jetson-containers/packages/cuda-python/config.py -- Loading /home/agx/study/nvidia/jetson-containers/packages/llm/awq/config.py -- Loading /home/agx/study/nvidia/jetson-containers/packages/llm/text-generation-inference/config.py -- Loading /home/agx/study/nvidia/jetson-containers/packages/llm/xformers/config.py -- Loading /home/agx/study/nvidia/jetson-containers/packages/llm/exllama/config.py -- Loading /home/agx/study/nvidia/jetson-containers/packages/llm/llama_cpp/config.py -- Loading /home/agx/study/nvidia/jetson-containers/packages/llm/optimum/config.py -- Loading /home/agx/study/nvidia/jetson-containers/packages/llm/auto_gptq/config.py -- Loading /home/agx/study/nvidia/jetson-containers/packages/llm/transformers/config.py -- Loading /home/agx/study/nvidia/jetson-containers/packages/ros/config.py -- Package ros:melodic-ros-base isn't compatible with L4T r35.3.1 (requires L4T <34) -- Package ros:melodic-ros-base was disabled by its config -- Package ros:melodic-ros-core isn't compatible with L4T r35.3.1 (requires L4T <34) -- Package ros:melodic-ros-core was disabled by its config -- Package ros:melodic-desktop isn't compatible with L4T r35.3.1 (requires L4T <34) -- Package ros:melodic-desktop was disabled by its config -- Loading /home/agx/study/nvidia/jetson-containers/packages/onnxruntime/config.py -- Loading /home/agx/study/nvidia/jetson-containers/packages/cupy/config.py -- Loading /home/agx/study/nvidia/jetson-containers/packages/pycuda/config.py -- Loading /home/agx/study/nvidia/jetson-containers/packages/onnx/config.py -- Building containers ['build-essential', 'python', 'cmake', 'numpy', 'opencv', 'ros:humble-desktop'] -- Building container my_container:l4t-r35.3.1-build-essential
sudo docker build --network=host --tag my_container:l4t-r35.3.1-build-essential \ --file /home/agx/study/nvidia/jetson-containers/packages/build-essential/Dockerfile \ --build-arg BASE_IMAGE=nvcr.io/nvidia/l4t-jetpack:r35.3.1 \ /home/agx/study/nvidia/jetson-containers/packages/build-essential \ 2>&1 | tee /home/agx/study/nvidia/jetson-containers/logs/20230824_201519/build/my_container_l4t-r35.3.1-build-essential.txt; exit ${PIPESTATUS[0]}
0 building with "default" instance using docker driver
1 [internal] load .dockerignore
1 transferring context: 2B 0.0s done
1 DONE 0.0s
2 [internal] load build definition from Dockerfile
2 transferring dockerfile: 595B done
2 DONE 0.0s
3 [auth] nvidia/l4t-jetpack:pull,push token for nvcr.io
3 DONE 0.0s
4 [internal] load metadata for nvcr.io/nvidia/l4t-jetpack:r35.3.1
4 ERROR: failed to authorize: failed to fetch oauth token: unexpected status from GET request to https://nvcr.io/proxy_auth?scope=repository%3Anvidia%2Fl4t-jetpack%3Apull%2Cpush: 401
Can someone please help me? thank you!