KavrakiLab / vamp

SIMD-Accelerated Sampling-based Motion Planning
Other
167 stars 22 forks source link

Installation fails on NVIDIA Jetson Orin #20

Closed Rooholla-KhorramBakht closed 4 months ago

Rooholla-KhorramBakht commented 4 months ago

Describe the bug Compilation fails on NVIDIA Jetson Orin.

To Reproduce Either in docker or locally, running:

pip install -v .

returns error:

.
.
.
FAILED: CMakeFiles/_core_ext.dir/src/impl/vamp/bindings/settings.cc.o
25.84   /usr/bin/aarch64-linux-gnu-g++  -pthread -D_core_ext_EXPORTS -I/usr/include/python3.8 -I/tmp/pip-req-build-g07z5nn4/build/cp38-cp38-linux_aarch64/_deps/nanobind-src/include -isystem /tmp/pip-req-build-g07z5nn4/build/cp38-cp38-linux_aarch64/_deps/nigh-src/src -isystem /tmp/pip-req-build-g07z5nn4/build/cp38-cp38-linux_aarch64/_deps/pdqsort-src -isystem /tmp/pip-req-build-g07z5nn4/src/impl -isystem /usr/include/eigen3 -mcpu=native -mtune=native -Wall -Wextra -O3 -DNDEBUG -O3 -fno-math-errno -fno-signed-zeros -fno-trapping-math -fno-rounding-math -ffp-contract=fast -flto=auto -std=c++17 -fPIC -fvisibility=hidden -fno-stack-protector -ffunction-sections -fdata-sections -MD -MT CMakeFiles/_core_ext.dir/src/impl/vamp/bindings/settings.cc.o -MF CMakeFiles/_core_ext.dir/src/impl/vamp/bindings/settings.cc.o.d -o CMakeFiles/_core_ext.dir/src/impl/vamp/bindings/settings.cc.o -c /tmp/pip-req-build-g07z5nn4/src/impl/vamp/bindings/settings.cc
25.84   In file included from /tmp/pip-req-build-g07z5nn4/src/impl/vamp/vector.hh:9,
25.84                    from /tmp/pip-req-build-g07z5nn4/src/impl/vamp/planning/roadmap.hh:11,
25.84                    from /tmp/pip-req-build-g07z5nn4/src/impl/vamp/bindings/settings.cc:1:
25.84   /tmp/pip-req-build-g07z5nn4/src/impl/vamp/vector/neon.hh: In static member function ‘static constexpr vamp::SIMDVector<__vector(4) float>::VectorT vamp::SIMDVector<__vector(4) float>::lshift_dispatch(vamp::SIMDVector<__vector(4) float>::VectorT)’:
25.85   /tmp/pip-req-build-g07z5nn4/src/impl/vamp/vector/neon.hh:219:53: error: cannot convert ‘uint32x4_t’ {aka ‘__vector(4) unsigned int’} to ‘float32x4_t’ {aka ‘__vector(4) float’}
25.85     219 |             return vreinterpretq_u32_f32(vshlq_n_u32(vreinterpretq_u32_f32(v), i));
25.85         |                                          ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
25.85         |                                                     |
25.85         |                                                     uint32x4_t {aka __vector(4) unsigned int}
25.85   In file included from /tmp/pip-req-build-g07z5nn4/src/impl/vamp/vector/neon.hh:12,
25.85                    from /tmp/pip-req-build-g07z5nn4/src/impl/vamp/vector.hh:9,
25.85                    from /tmp/pip-req-build-g07z5nn4/src/impl/vamp/planning/roadmap.hh:11,
25.85                    from /tmp/pip-req-build-g07z5nn4/src/impl/vamp/bindings/settings.cc:1:
25.85   /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h:6004:36: note:   initializing argument 1 of ‘uint32x4_t vreinterpretq_u32_f32(float32x4_t)’
25.85    6004 | vreinterpretq_u32_f32 (float32x4_t __a)
25.85         |                        ~~~~~~~~~~~~^~~
25.85   In file included from /tmp/pip-req-build-g07z5nn4/src/impl/vamp/vector.hh:9,
25.86                    from /tmp/pip-req-build-g07z5nn4/src/impl/vamp/planning/roadmap.hh:11,
25.86                    from /tmp/pip-req-build-g07z5nn4/src/impl/vamp/bindings/settings.cc:1:
25.86   /tmp/pip-req-build-g07z5nn4/src/impl/vamp/vector/neon.hh: In static member function ‘static constexpr vamp::SIMDVector<__vector(4) float>::VectorT vamp::SIMDVector<__vector(4) float>::rshift_dispatch(vamp::SIMDVector<__vector(4) float>::VectorT)’:
25.86   /tmp/pip-req-build-g07z5nn4/src/impl/vamp/vector/neon.hh:243:53: error: cannot convert ‘uint32x4_t’ {aka ‘__vector(4) unsigned int’} to ‘float32x4_t’ {aka ‘__vector(4) float’}
25.86     243 |             return vreinterpretq_u32_f32(vshrq_n_u32(vreinterpretq_u32_f32(v), i));
25.86         |                                          ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
25.86         |                                                     |
25.86         |                                                     uint32x4_t {aka __vector(4) unsigned int}
25.86   In file included from /tmp/pip-req-build-g07z5nn4/src/impl/vamp/vector/neon.hh:12,
25.86                    from /tmp/pip-req-build-g07z5nn4/src/impl/vamp/vector.hh:9,
25.86                    from /tmp/pip-req-build-g07z5nn4/src/impl/vamp/planning/roadmap.hh:11,
25.86                    from /tmp/pip-req-build-g07z5nn4/src/impl/vamp/bindings/settings.cc:1:
25.86   /usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h:6004:36: note:   initializing argument 1 of ‘uint32x4_t vreinterpretq_u32_f32(float32x4_t)’
25.86    6004 | vreinterpretq_u32_f32 (float32x4_t __a)
25.86         |                        ~~~~~~~~~~~~^~~
.
.
.    

Expected behavior Installation to complete with no errors.

Environment:

wbthomason commented 4 months ago

Thanks for your report! We unfortunately don't have a Jetson to test on, so we might need some further help from you to debug. As a first step, can you share the exact specs of your Jetson (i.e. which model, the output of lscpu) so that we know what version of NEON it supports and can figure out why the reinterpret call in question is a problem?

wbthomason commented 4 months ago

Ah, nevermind - I think I see the issue.

Rooholla-KhorramBakht commented 4 months ago

Thanks a lot for your quick response. The CPU info is:

Architecture:            aarch64
  CPU op-mode(s):        32-bit, 64-bit
  Byte Order:            Little Endian
CPU(s):                  12
  On-line CPU(s) list:   0-11
Vendor ID:               ARM
  Model name:            Cortex-A78AE
    Model:               1
    Thread(s) per core:  1
    Core(s) per cluster: 4
    Socket(s):           -
    Cluster(s):          3
    Stepping:            r0p1
    CPU max MHz:         2201.6001
    CPU min MHz:         115.2000
    BogoMIPS:            62.50
    Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp
                          asimdhp cpuid asimdrdm lrcpc dcpop asimddp uscat ilrcp
                         c flagm paca pacg
Caches (sum of all):     
  L1d:                   768 KiB (12 instances)
  L1i:                   768 KiB (12 instances)
  L2:                    3 MiB (12 instances)
  L3:                    6 MiB (3 instances)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-11
Vulnerabilities:         
  Gather data sampling:  Not affected
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Not affected
  Retbleed:              Not affected
  Spec rstack overflow:  Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; __user pointer sanitization
  Spectre v2:            Mitigation; CSV2, but not BHB
  Srbds:                 Not affected
  Tsx async abort:       Not affected