RainerMtb / cuvista

Accelerated Optical Video Stabilizer, Cuda, OpenCL, Avx512
https://rainermtb.github.io/cuvista/
GNU General Public License v3.0
6 stars 1 forks source link

running on linux/ubuntu24.04 #3

Open egoodarzi opened 3 days ago

egoodarzi commented 3 days ago

Hi Rainer, Your project looks awesome! Unfortunately I ran into some issue on startup when I was trying to run it on Ubuntu 24.04 with default compiler toolchain, AMD CPU, Nvidia GPU, Cuda 12.6. Here are some details below, hope you can suggest a workaround of some sorts. Cheers. When application starts it crashes on this line:

//main program start
MainData data;

Disassembly shows this instruction:

vpbroadcastq %rcx,%xmm0

Seems like an issue with avx2? dmesg log:

[Mon Oct 14 22:46:56 2024] traps: cuvista[204770] trap invalid opcode ip:55b63e186213 sp:7ffe421fcc20 error:0 in cuvista[55b63e15a000+17c000]

The processor has support for avx, avx2, but not avx512:

lscpu 
Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          48 bits physical, 48 bits virtual
  Byte Order:             Little Endian
CPU(s):                   64
  On-line CPU(s) list:    0-63
Vendor ID:                AuthenticAMD
  Model name:             AMD Ryzen Threadripper PRO 5975WX 32-Cores
    CPU family:           25
    Model:                8
    Thread(s) per core:   2
    Core(s) per socket:   32
    Socket(s):            1
    Stepping:             2
    Frequency boost:      enabled
    CPU(s) scaling MHz:   29%
    CPU max MHz:          7006.6401
    CPU min MHz:          1800.0000
    BogoMIPS:             7186.15
    Flags:                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht
                           syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apic
                          id aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes x
                          save avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch o
                          svw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw
                          _pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed 
                          adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total c
                          qm_mbm_local user_shstk clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin brs arat npt lbrv svm_lock 
                          nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmloa
                          d vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm debug_swap

Any suggestions would be appreciated.

RainerMtb commented 3 days ago

Indeed, there is a problem with avx code that gets executed on machines without AVX512, I will have a look there. Can you test the repo as it was on 16. Aug. 2024, I think you should be able to get that state via git checkout a7fdcc1 then compile and run that code. Run cuvista -info to see available devices.

egoodarzi commented 2 days ago

Thanks for your quick reply, i tried that commit and result was exactly the same.

RainerMtb commented 2 days ago

Yes, I can see, will have to investigate

RainerMtb commented 5 hours ago

Have a look at the latest commit, should work now