Question about building the containers

Noam-M commented 3 years ago

I did as instructed to build the ML containers, using the following command:

./scripts/docker_build_ml.sh all

It took several hours to run. Afterwards, I ran the following command in order to test the containers:

./scripts/docker_test_ml.sh all

I got the output down below. How do I know if it's ok or not? If not, what should I do next? Thanks for the help!

nvidia@Jetson:~/jetson-containers$ ./scripts/docker_test_ml.sh all reading L4T version from /etc/nv_tegra_release L4T BSP Version: L4T R32.4.4 testing container l4t-pytorch:r32.4.4-pth1.6-py3 => PyTorch localuser:root being added to access control list testing PyTorch... PyTorch version: 1.6.0 CUDA available: True cuDNN version: 8000 Tensor a = tensor([0., 0.], device='cuda:0') Tensor b = tensor([0.5379, 0.0701], device='cuda:0') Tensor c = tensor([0.5379, 0.0701], device='cuda:0') testing LAPACK (OpenBLAS)... done testing LAPACK (OpenBLAS) testing torch.nn (cuDNN)... done testing torch.nn (cuDNN) PyTorch OK \ndownloading data for testing torchvision... test/data/ILSVRC2012_img_val_subset_5k.tar.gz: No such file or directory

dusty-nv commented 3 years ago

test/data/ILSVRC2012_img_val_subset_5k.tar.gz: No such file or directory

It looks like it didn't succeed in downloading the test dataset used to test torchvision. You could try it again, or download it manually like is done here:

https://github.com/dusty-nv/jetson-containers/blob/53a56dfd59eb49607552fd20edde2f2efa54b996/scripts/docker_test_ml.sh#L74

Noam-M commented 3 years ago

Thank you @dusty-nv for your quick reply. Using your comment, I have found a solution (or at least I think I have) The problem was that I didn't have a folder "data" under "test". That is "test/data" didn't exist - and there the downloaded tar.gz file was supposed to be saved. Once I created the folder manually myself - the test continued to run normally.

I do think it could be beneficial to add to the test code: create the necessary folder if one doesn't already exist.

The test has continued, and completed successfully the testing of PyTorch and TensorRT. The testing of PyCUDA had an error though:

testing container l4t-pytorch:r32.4.4-pth1.6-py3 => PyCUDA localuser:root being added to access control list testing PyCUDA... Traceback (most recent call last): File "test/test_cuda.py", line 5, in import pycuda.driver as cuda File "/usr/local/lib/python3.6/dist-packages/pycuda/driver.py", line 6, in import six ModuleNotFoundError: No module named 'six'

What do you think I should do in this case? Thanks again!

Noam-M commented 3 years ago

Also, I forgot to mention that i'm using a brand new Jetson Xavier NX. The first thing I did after installing JetPack 4.4.1 is build the docker container using this guide. I am not sure in which part of the process python should get installed, but I checked my python version and it is 2.7.17. I know that the l4t-ml work in a python 3.6 environment but I'm not sure if the python installation part of the container or the jetpack or does it have to be manually installed, and if so - whether there are some configurations or PATHs that are expected.

Thank you @dusty-nv for your help! Happy new year!

dusty-nv commented 3 years ago

For future reference, these containers are already built for JetPack 4.4.1 and can be pulled from NGC, so you needn't build them yourself (unless you want to).

You can find the NGC links at the top of the readme:

https://github.com/dusty-nv/jetson-containers#machine-learning-containers-for-jetson-and-jetpack

Noam-M commented 3 years ago

Thank you @dusty-nv for your reply. Sorry for my questions, I have little to no experience using dockers. I do not see any use for building it myself, I am looking for the simplest way to run NNs on my Jetson. From what's written in the readme I understood that I DO have to build them myself...

So now before I pull the container from the NGC, do I need to perform any action to undo what I did until now?

Thanks again!

dusty-nv commented 3 years ago

So now before I pull the container from the NGC, do I need to perform any action to undo what I did until now?

You don't need to do anything to undo it, but since you already built them you also don't need to pull from NGC anymore.

Noam-M commented 3 years ago

Thank you again @dusty-nv for your quick reply

I don't think that the build was successful, since the testing of PyCUDA had an error regarding the "six" package. So from what I understand I do need to pull from NGC. Is that correct?

Thanks

dusty-nv commented 3 years ago

Yes in that case you could pull from NGC. Although PyCUDA isn't typically a crucial component in these containers, it's there as an add-on. PyTorch/TensorFlow/ect don't use it.

Noam-M commented 3 years ago

You can find the NGC links at the top of the readme: https://github.com/dusty-nv/jetson-containers#machine-learning-containers-for-jetson-and-jetpack

Is there a difference between the container in NGC and the Hello AI World container for NVIDIA Jetson? https://hub.docker.com/r/dustynv/jetson-inference/tags

Thanks!

Noam-M commented 3 years ago

https://github.com/dusty-nv/jetson-containers#machine-learning-containers-for-jetson-and-jetpack

Is there a difference between the container in NGC and the Hello AI World container for NVIDIA Jetson? https://hub.docker.com/r/dustynv/jetson-inference/tags

https://github.com/dusty-nv/jetson-containers#machine-learning-containers-for-jetson-and-jetpack

Thank you so much for your help!

dusty-nv commented 3 years ago

Is there a difference between the container in NGC and the Hello AI World container for NVIDIA Jetson? https://hub.docker.com/r/dustynv/jetson-inference/tags

The jetson-inference container is based on the l4t-pytorch container from NGC, but the jetson-inference container has the jetson-inference project installed on top.

dusty-nv / jetson-containers

Question about building the containers #34