Import failing when torch.distributed is not available

Mattzi commented 7 months ago

💡 Your Question

``Hi,

I am trying to use yolo-nas on a jetson xavier and need super-gradients to do this. Unfortunately the Jetson Torch Wheels (that are needed for cuda support) are built without USE_DISTRIBUTED so torch.distributed.is_available() returns false.

When running

import super_gradients

I get the Error: ImportError: cannot import name 'get_rank' from 'torch.distributed'

Just to double check, to use super_gradients AND have Cuda support available I need to build pytorch from source as torch.distributed is a mandatory requirement for super-gradients?

Versions

PyTorch version: 1.13.0a0+936e9305.nv22.11 Is debug build: False CUDA used to build PyTorch: 11.4 ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.6 LTS (aarch64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 Clang version: Could not collect CMake version: version 3.28.3 Libc version: glibc-2.31

Python version: 3.8.10 (default, Nov 22 2023, 10:22:35) [GCC 9.4.0] (64-bit runtime) Python platform: Linux-5.10.104-tegra-aarch64-with-glibc2.29 Is CUDA available: True CUDA runtime version: 11.4.315 CUDA_MODULE_LOADING set to: GPU models and configuration: Could not collect Nvidia driver version: Could not collect cuDNN version: Probably one of the following: /usr/lib/aarch64-linux-gnu/libcudnn.so.8.6.0 /usr/lib/aarch64-linux-gnu/libcudnn_adv_infer.so.8.6.0 /usr/lib/aarch64-linux-gnu/libcudnn_adv_train.so.8.6.0 /usr/lib/aarch64-linux-gnu/libcudnn_cnn_infer.so.8.6.0 /usr/lib/aarch64-linux-gnu/libcudnn_cnn_train.so.8.6.0 /usr/lib/aarch64-linux-gnu/libcudnn_ops_infer.so.8.6.0 /usr/lib/aarch64-linux-gnu/libcudnn_ops_train.so.8.6.0 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: False

CPU: Architecture: aarch64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 6 On-line CPU(s) list: 0-5 Thread(s) per core: 1 Core(s) per socket: 2 Socket(s): 3 Vendor ID: Nvidia Model: 0 Model name: ARMv8 Processor rev 0 (v8l) Stepping: 0x0 CPU max MHz: 1907,2000 CPU min MHz: 115,2000 BogoMIPS: 62.50 L1d cache: 384 KiB L1i cache: 768 KiB L2 cache: 6 MiB L3 cache: 4 MiB Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Spec store bypass: Not affected Vulnerability Spectre v1: Mitigation; __user pointer sanitization Vulnerability Spectre v2: Mitigation; Branch predictor hardening Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm dcpop

Versions of relevant libraries: [pip3] numpy==1.23.0 [pip3] onnx==1.15.0 [pip3] onnx-graphsurgeon==0.3.12 [pip3] onnx-simplifier==0.3.5 [pip3] onnxoptimizer==0.3.13 [pip3] onnxruntime==1.15.0 [pip3] onnxsim==0.4.36 [pip3] torch==1.13.0a0+936e9305.nv22.11 [pip3] torch2trt==0.4.0 [pip3] torchmetrics==0.8.0 [pip3] torchvision==0.13.0 [conda] Could not collect

BloodAxe commented 7 months ago

Thanks for reporting this. I'm currently not able to give an estimate when you can expect this to be addressed. But I can tell that running YoloNAS on jetson from SG is probably not a great idea. Simply because the way eager pytorch execution mode works you would get very, very sub-optimal performance. What you may want to use instead is to export model to ONNX and from there run model using TRT. You may check an example here https://github.com/Deci-AI/super-gradients/blob/master/notebooks/YoloNAS_Inference_using_TensorRT.ipynb

Mattzi commented 7 months ago

Thank you, we are planning on using tensorrt later on but currently it is easier to evaluate different models just using the "basic" variant. Problem with the conversion is that we would also need to import super-gradients for doing the conversion but I guess we can do this on a machine with a proper nvidia gpu running normal ubuntu or windows

Mattzi commented 7 months ago

Just to have some more information on this: https://forums.developer.nvidia.com/t/torch-distributed-is-not-available-on-jetson-pytorch/256254

As said there a workaround would be the older pytorch version or building pytorch from source: https://forums.developer.nvidia.com/t/pytorch-for-jetson/72048

My guess would be that closed issue #1538 was having the same problem as there is also a nvidia torch version used there indicated by nv23.2

Deci-AI / super-gradients

Import failing when torch.distributed is not available #1909

💡 Your Question

Versions