Open smbunn opened 7 months ago
Hi @smbunn, can you run sudo journalctl -u docker.service
for more detailed logs hopefully? If you followed steps from the System Setup, had you previously rebooted since changing your /etc/docker/daemon.json
? Had you run apt-get upgrade
?
I followed all the steps including cutting and pasting inito the daemon.json file. I have accepted the software updates prompted by Ubuntu 22.04 on my Jetson Orin Nano. Attached is the output from the journalctl request journalctl.txt
Is it the "failed to register bridge" that is the issue?
Yes, in the jetson-inference and jetson-containers install there is no apt upgrade
So never use apt-get update or apt-get upgrade, nor accept Software Updates?
I tried the sudo apt reinstall docker-ce
suggested by the link you sent but at the end I get Could not execute systemctl: at /usr/bin/deb-systemd-invoke line 142.
Do I really have to start again from scratch and keep my Jetson Nano completely un-updated forever?
I'm not sure what all packages are affected due to that bug, but you could then use apt mark hold
on them until its resolved.
The only thing I did that I will really miss is the long process to have OpenCV with CUDA enabled. That took quite a few steps and quite a bit of effort. Hours of effort. I imaged my drive after I did this but overwrote that image with the working copy with jetson-inference installed. Now that image is not working as of course I hadn't rebooted it :-(
You can find links to my OpenCV+CUDA binaries here - they are tarballs of opencv deb's that get installed in the containers by opencv_install.sh
Sorry, if you can't fix what the upgrade did to docker, then yea I would re-flash one more time and then not upgrade it until issue is resolved or you know what packages to pin.
Can I use SDK Manager on my other Ubuntu machine to just re-install from fresh or do I have to boot with the jumper installed and format the NVMe drive?
I just flash the OS to eMMC or SD card, and put all my projects/data/containers on the NVME. Did you have the OS installed to the NVME?
Regardless, to you would need to boot into recovery mode using the jumper, while USB-C cable is attached to your other Ubuntu machine where SDK Manager will flash it from.
I have the jumper installed and using lsusb on my other ubuntu machine but I dont see it at all. Looks like I might have to remove the NVMe drive, install it on my ubuntu server, format it, re-install it on the jeston and try again. No idea why the Jetson is not responding to a cold re-initialize
The NVME drive shouldn't impact it going into recovery mode or not. Double check the jumper location and USB connection are correct, and if you having issues flashing the device I would recommend you post it to the forums. You could quickly remove your NVME from your Jetson just to eliminate any possibility of that but I don't believe it would prevent going into recovery mode.
Nowhere have I ever seen a diagram on which two pins to jumper. I followed the video on you tube https://www.youtube.com/watch?v=Ucg5Zqm9ZMk and guessed from the camera angle that they were the last but two pins to the right hand end of the header under the fan unit. The YouTube clip calls them 9 and 10 but they appear to be labelled FC Rec and GND. Right?
My USB-C socket needed crimping a bit with needle nose pliers. Now the connection is secure and I have everything back to base. I am using Jetpack 6.0DP. This is the latest and I think it is Ubuntu Pro, can anyone confirm this?
@smbunn JetPack 6.0 comes with Ubuntu 22.04
You can find links to my OpenCV+CUDA binaries here - they are tarballs of opencv deb's that get installed in the containers by
opencv_install.sh
Sorry, if you can't fix what the upgrade did to docker, then yea I would re-flash one more time and then not upgrade it until issue is resolved or you know what packages to pin.
Does everything have to be in a separate container? What do I do if I just want opencv with cuda suppport in the main environment? You mention deb files above but I couldn't find them.
Note above where I said "Is it the "failed to register bridge" that is the issue?" Looks like this is the case according to this thread on the NVIDIA site. https://forums.developer.nvidia.com/t/docker-gives-error-after-upgrading-ubuntu/283563
I looked at run.sh and it has --network host which to me seems like it never uses the bridge docker0. But docker itself cannot start up and fails on the bridge connection. I have tried the repair indicated in the nvidia developers forum, but like many others have had to report that this does not work. So at present I maintain a clean 'just installed" image of my Jetson Orin Nano with JetPack 6.0 and try and make sure no updates ever occur. This gets tricky as some installs want you to run update and upgrade. Hopefully someone will figure out how to repair the bridge in Docker so we can resume normal Ubuntu updates processes. When I accidentally allow a script to run that updates, I clone my NVMe drive back to the original and start again. I have now done this so often I keep 2 NVMe drives on the go so I can always swap to the clean one when docker fails.
I will add that this is still worthwhile as the dusty-nv containers are awesome! I am really enjoying using them.
@smbunn We ran into this same issue on our AGX Orin running Jetpack 6. I'm not sure what we did differently, but we ran the exact commands mentioned in the post you linked above (where we ran the commands specified in the issue they link to first) and that seemed to do the trick. We are now able to run the image using the run_dev.sh
script.
I just tried this and it worked for me. Someone posted it in the Nvidia forum thread mentioned above
sudo update-alternatives --set iptables /usr/sbin/iptables-legacy
sudo apt reinstall docker-ce
I also did the set iptables to legacy commands and this worked perfectly. It was discussed on https://forums.developer.nvidia.com/t/docker-gives-error-after-upgrading-ubuntu/283563/11
You can find links to my OpenCV+CUDA binaries here - they are tarballs of opencv deb's that get installed in the containers by
opencv_install.sh
Sorry, if you can't fix what the upgrade did to docker, then yea I would re-flash one more time and then not upgrade it until issue is resolved or you know what packages to pin.
Figured out how to do this. Ubuntu is still fairly new to me. Found your gz files and the DEB name from the link you gave above Then ran :
sudo ./jetson-containers/packages/opencv/opencv_install.sh https://nvidia.box.com/shared/static/ngp26xb9hb7dqbu6pbs7cs9flztmqwg0.gz OpenCV-4.8.1-aarch64.tar.gz
Worked a treat and now I have openCV 4.8.1 installed in the base environment with CUDA support
Thanks Dusty!
I did a clean install of JetPack 6.0DP on my Jetson Orin Nano. Onto a 500 Gb NVMe drive. All good and eveything installed. Then installed jetson-inference and jetson-containers from this site. Everything runs perfectly. Tested almost all examples in inference and all good.
The I rebooted my Jetson. Now docker fails to run. If I run
sudo systemctl status docker
It just says it failed to load. No real error message beyond
docker.service: Failed with result 'error code'.
Process isExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock (code=exited, status=1/FAILURE
Any ideas?