Open JoostdeK opened 1 week ago
@JoostdeK yes, numpy 2.0 was released Sunday and immediately started breaking things in the builds lol...
I previously patched a few dockerfiles including NanoLLM to lock it to numpy<2
, like in commit https://github.com/dusty-nv/jetson-containers/commit/b19854a9f02ad4e2658679475b5c3c9fec41a375
Can you recall if you were running updated jetson-containers before trying that build? I am curious if I should just carte-blanch pin it in the underlying numpy dockerfile or not, or only where needed. I was also having an issue where some packages like scipy were upgrading it to numpy2 later in the builds.
I will try it like you did in the numpy dockerfile itself, and see if those builds pass. I think until a lot of the major ML packages catch up there will be a lot of these errors occurring.
Sadly no, have never build one before, only got the orin nano last week haha. But I figured it should work so that why I spend some time. I think the commit you mentioned for NanoLLM also lead me to try this. So sadly since I'm not familiar with more of the code yet, I have no more advice haha.
Edit: I did remove all docker images and system prune so that nothing was cached. That was before I found the option for build flags =)
If anyone else is having this issue when building OpenCV, I successfully patched by adding this line in packages/opencv/install.sh
:
diff --git a/packages/opencv/install.sh b/packages/opencv/install.sh
index 480ba8c1..86f68ea7 100755
--- a/packages/opencv/install.sh
+++ b/packages/opencv/install.sh
@@ -19,5 +19,6 @@ else
$ROOT/install_pip.sh
fi
+python3 -m pip install --force-reinstall 'scipy<1.13' 'numpy<2'
python3 -c "import cv2; print('OpenCV version:', str(cv2.__version__)); print(cv2.getBuildInformation())"
EDIT: this was actually unnecessary, simply specifying the version in packages/numpy/Dockerfile as the OP did, did the trick, I just didn't know my way around this repo before.
Thanks @eufrizz - sadly the experiment early of installing numpy<2
in the numpy dockerfile did not work, because packages in later containers install numpy2 (like scipy). Until all the downstream dependencies catch up, not sure how to fix this right now in an automated way without the manual patches in the other dockerfiles. Can't imagine we are the only ones feeling the pain 🤣
@dusty-nv Which build command failed? I'll try too.
Put here all content that fails, for sure external libraries like onnxruntime, scipy etc. Pytorch, opencv, tensorflow is already compatible with numpy 2.0,
Having a similar problem. When I try to build this container,
jetson-containers build --name=torchbase pytorch opencv python:3.12 ffmpeg numpy torchvision
I get an error:
Requirement already satisfied: numpy>=1.21.2 in /usr/local/lib/python3.10/dist-packages (from opencv-contrib-python==4.8.1.84) (2.0.0)
Installing collected packages: opencv-contrib-python
Attempting uninstall: opencv-contrib-python
Found existing installation: opencv-contrib-python 4.8.1.80
Uninstalling opencv-contrib-python-4.8.1.80:
Removing file or directory /usr/local/lib/python3.10/dist-packages/cv2/
Removing file or directory /usr/local/lib/python3.10/dist-packages/opencv_contrib_python-4.8.1.80.dist-info/
Successfully uninstalled opencv-contrib-python-4.8.1.80
Successfully installed opencv-contrib-python-4.8.1.84
+ python3 -c 'import cv2; print('\''OpenCV version:'\'', str(cv2.__version__)); print(cv2.getBuildInformation())'
A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.0 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.
If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.
Traceback (most recent call last): File "<string>", line 1, in <module>
File "/usr/local/lib/python3.10/dist-packages/cv2/__init__.py", line 181, in <module>
bootstrap()
File "/usr/local/lib/python3.10/dist-packages/cv2/__init__.py", line 153, in bootstrap
native_module = importlib.import_module("cv2")
File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
AttributeError: _ARRAY_API not found
Traceback (most recent call last):
Is there a chance that this is the same issue?
Having a similar problem. When I try to build this container,
jetson-containers build --name=torchbase pytorch opencv python:3.12 ffmpeg numpy torchvision
I get an error:
Requirement already satisfied: numpy>=1.21.2 in /usr/local/lib/python3.10/dist-packages (from opencv-contrib-python==4.8.1.84) (2.0.0) Installing collected packages: opencv-contrib-python Attempting uninstall: opencv-contrib-python Found existing installation: opencv-contrib-python 4.8.1.80 Uninstalling opencv-contrib-python-4.8.1.80: Removing file or directory /usr/local/lib/python3.10/dist-packages/cv2/ Removing file or directory /usr/local/lib/python3.10/dist-packages/opencv_contrib_python-4.8.1.80.dist-info/ Successfully uninstalled opencv-contrib-python-4.8.1.80 Successfully installed opencv-contrib-python-4.8.1.84 + python3 -c 'import cv2; print('\''OpenCV version:'\'', str(cv2.__version__)); print(cv2.getBuildInformation())' A module that was compiled using NumPy 1.x cannot be run in NumPy 2.0.0 as it may crash. To support both 1.x and 2.x versions of NumPy, modules must be compiled with NumPy 2.0. Some module may need to rebuild instead e.g. with 'pybind11>=2.12'. If you are a user of the module, the easiest solution will be to downgrade to 'numpy<2' or try to upgrade the affected module. We expect that some modules will need time to support NumPy 2. Traceback (most recent call last): File "<string>", line 1, in <module> File "/usr/local/lib/python3.10/dist-packages/cv2/__init__.py", line 181, in <module> bootstrap() File "/usr/local/lib/python3.10/dist-packages/cv2/__init__.py", line 153, in bootstrap native_module = importlib.import_module("cv2") File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) AttributeError: _ARRAY_API not found Traceback (most recent call last):
Is there a chance that this is the same issue?
for opencv with numpy 2.0 is necessary to be 4.10.0.84
Having a similar problem. When I try to build this container,
jetson-containers build --name=torchbase pytorch opencv python:3.12 ffmpeg numpy torchvision
I get an error:
Requirement already satisfied: numpy>=1.21.2 in /usr/local/lib/python3.10/dist-packages (from opencv-contrib-python==4.8.1.84) (2.0.0) Installing collected packages: opencv-contrib-python Attempting uninstall: opencv-contrib-python Found existing installation: opencv-contrib-python 4.8.1.80 Uninstalling opencv-contrib-python-4.8.1.80: Removing file or directory /usr/local/lib/python3.10/dist-packages/cv2/ Removing file or directory /usr/local/lib/python3.10/dist-packages/opencv_contrib_python-4.8.1.80.dist-info/ Successfully uninstalled opencv-contrib-python-4.8.1.80 Successfully installed opencv-contrib-python-4.8.1.84 + python3 -c 'import cv2; print('\''OpenCV version:'\'', str(cv2.__version__)); print(cv2.getBuildInformation())' A module that was compiled using NumPy 1.x cannot be run in NumPy 2.0.0 as it may crash. To support both 1.x and 2.x versions of NumPy, modules must be compiled with NumPy 2.0. Some module may need to rebuild instead e.g. with 'pybind11>=2.12'. If you are a user of the module, the easiest solution will be to downgrade to 'numpy<2' or try to upgrade the affected module. We expect that some modules will need time to support NumPy 2. Traceback (most recent call last): File "<string>", line 1, in <module> File "/usr/local/lib/python3.10/dist-packages/cv2/__init__.py", line 181, in <module> bootstrap() File "/usr/local/lib/python3.10/dist-packages/cv2/__init__.py", line 153, in bootstrap native_module = importlib.import_module("cv2") File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) AttributeError: _ARRAY_API not found Traceback (most recent call last):
Is there a chance that this is the same issue?
That is the same issue. I just tried it, it builds on my pc. That is with the solution that worked for me. But Dusty mentioned it doesnt work for him. As I mentioned in my post, I did clear all of docker of any and all lingering images and caches. Could be pulling something from cache with numpy > 2?
As I mentioned in my post, I did clear all of docker of any and all lingering images and caches. Could be pulling something from cache with numpy > 2?
@JoostdeK it could be that the container stack you were building did not have other packages installed during the build which upgraded numpy later. For example, right now anytime scipy gets installed, it wants to auto-upgrade numpy to numpy2. And in the nano_llm build I tried, scipy gets installed at some point (which is why I had to pip3 install --force-reinstall 'scipy<1.13' 'numpy<2'
instead of just numpy<2
, because scipy<1.13
is before it started depending on numpy2
)
I'm not going to exhaustively go through each Dockerfile and temporarily try/patch all instances where numpy needs pinned - as @johnnynunez pointed out, fortunately some packages have already started catching up. Let's continue posting the ones with issues to this thread and selectively patch them as needed - and for now, I have committed the patch to the main numpy dockerfile (this is in jetson-containers dev
branch in https://github.com/dusty-nv/jetson-containers/commit/4c8c306739af22627525df1b6e714d57567b9c45)
As mentioned, downstream pip installs can still override this (which in some cases seems desirable if the likes of pytorch, opencv, scipy, ect need numpy2). The challenges come in when packages in the same container are incompatible due to one of them needing numpy2 while others need numpy<2.
I couldnt for the life of me get the command:
jetson-containers build whisper_trt nano_llm --name xyz
to work. I saw alot of Numpy errors during the build, and later it would fail on it on the onnyxruntime step.These were some of the errors:
I searched for this and found this issue: https://github.com/pytorch/pytorch/issues/128860#issuecomment-2175641041
Here it was mentioned that setting numpy to 1.26.4 fixed that issue. I adjusted the Numpy dockerfile in jetson-containers to:
That solved the build issue for me. No idea yet how it runs tho lol.