curv3d / curv

a language for making art using mathematics
Apache License 2.0
1.14k stars 73 forks source link

problem with AMD Gallium/Mesa (open source) GPU driver on Linux #30

Closed gsohler closed 6 years ago

gsohler commented 6 years ago

I tried to create a nice wire model cube like this, but it only renders @ 1 FPS this is way slower then originator. How to write fast curv code ?

union[ for (i in 0..1) for (j in 0..1) ( capsule{ d:0.3, from:[i,j,0], to:[i,j,1] } ; capsule{ d:0.3, from:[i,0,j], to:[i,1,j] } ; capsule{ d:0.3, from:[0,i,j], to:[1,i,j] } ) ]

doug-moen commented 6 years ago

It shouldn't be this slow.

For performance, the Curv 3D viewing window depends on access to a hardware GPU made by Intel, AMD or Nvidia, using the vendor-supplied GPU driver. Maybe you are running Curv inside a VM, and the GPU driver is simulating a GPU in software?

gsohler commented 6 years ago

Doug, you are right. I was running curv inside vncviewer (by chance). There its is 1FPs However, directly its not even displaying anything. selection_100

I am really interested to get curv working with good performance

/proc/cpuiinfo yields below:

what is wrong with my setup ?

processor : 0 vendor_id : AuthenticAMD cpu family : 21 model : 2 model name : AMD FX(tm)-4300 Quad-Core Processor stepping : 0 microcode : 0x600084f cpu MHz : 1800.000 cache size : 2048 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 16 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb cpb hw_pstate vmmcall bmi1 arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold bugs : fxsave_leak sysret_ss_attrs null_seg bogomips : 7634.90 TLB size : 1536 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro

processor : 1 vendor_id : AuthenticAMD cpu family : 21 model : 2 model name : AMD FX(tm)-4300 Quad-Core Processor stepping : 0 microcode : 0x600084f cpu MHz : 1400.000 cache size : 2048 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 17 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb cpb hw_pstate vmmcall bmi1 arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold bugs : fxsave_leak sysret_ss_attrs null_seg bogomips : 7633.47 TLB size : 1536 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro

processor : 2 vendor_id : AuthenticAMD cpu family : 21 model : 2 model name : AMD FX(tm)-4300 Quad-Core Processor stepping : 0 microcode : 0x600084f cpu MHz : 1800.000 cache size : 2048 KB physical id : 0 siblings : 4 core id : 2 cpu cores : 2 apicid : 18 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb cpb hw_pstate vmmcall bmi1 arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold bugs : fxsave_leak sysret_ss_attrs null_seg bogomips : 7633.49 TLB size : 1536 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro

processor : 3 vendor_id : AuthenticAMD cpu family : 21 model : 2 model name : AMD FX(tm)-4300 Quad-Core Processor stepping : 0 microcode : 0x600084f cpu MHz : 1800.000 cache size : 2048 KB physical id : 0 siblings : 4 core id : 3 cpu cores : 2 apicid : 19 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb cpb hw_pstate vmmcall bmi1 arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold bugs : fxsave_leak sysret_ss_attrs null_seg bogomips : 7633.47 TLB size : 1536 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro

doug-moen commented 6 years ago

I assume you are running Linux inside a VM on Windows. You need to enable GPU acceleration in the VM. I assume that any GPU intensive 3D software will have the same problem that Curv is having, in this Linux-on-VM environment, and that this problem is not Curv specific.

VncViewer is just a way to view the video display on a remote machine over a network connection. I guess you'd use it for communicating with programs running inside a VM. I don't expect it to affect the frame rate reported by Curv, but I expect it would add latency and make animations look jumpy.

doug-moen commented 6 years ago

In the latest version of Curv, curv --version now prints some debug information about the GPU. It shows the GPU model and driver information, as seen by Curv. It is intended to be useful in debugging GPU issues, and should provide some insight into what the GPU looks like from inside the VM.

gsohler commented 6 years ago

Doug, I dont run in a VM. I run in native linux when I test it. Did you notice errors given when running curv without vnc in the picture above ? it says:

EE r600_shader.c:182 r600_pipe_shader_create - translation from TGSI failed ! EE r600_state_common.c:798 r600_shader_select - Failed to build shader variant (type=1) -1

The new curv -version now outputs:

Curv: 0.1-226-g12dcafe GPU: X.Org Gallium 0.4 on AMD RS780 (DRM 2.49.0 / 4.11.12-100.fc24.x86_64, LLVM 3.8.0) OpenGL: 3.0 Mesa 12.0.3

Cheers Guenther

doug-moen commented 6 years ago

Thanks for the extra information. I'm playing with VNC now, but getting Curv working acceptably over VNC looks hard. I recommend using it in the "normal" way, using a display that is plugged directly into the GPU.

You are using the open source Mesa/Gallium driver for your GPU, at least in the non-VNC case where it is failing. I checked the error message, there's a bug open on Mesa for this, at https://bugs.freedesktop.org/show_bug.cgi?id=99349. As I understand the bug, a workaround in Curv is months of work, since it requires rewriting a significant part of the compiler. It happens if the GLSL code that I generate uses too many registers, so I need to write an optimizing compiler to limit GPU register use.

The VNC code path bypasses this bug, so maybe it's using a slow pure-software renderer instead of using the GPU? That would explain the low frame rate.

My solution is to use the vendor supplied GPU driver, from AMD. Don't use Mesa. https://support.amd.com/en-us/kb-articles/Pages/AMDGPU-PRO-Driver-for-Linux-Release-Notes.aspx Maybe this driver is already packaged by your distro.

gsohler commented 6 years ago

Hi Dough, Thank you for your elaborations. I tried to follow your recommendations installing a native AMD GPU-PRO driver. Looking at the available packages of my Fedora-24 I realized, that that there is a xorg-x11-drv-amdgpu Package available, which I immediately installed. However it did not work because i realized, its not the PRO driver. Your link to the driver looks promising, however I realized, that Fedora is not member of the offered drivers. Also in some newsgroups i read(from 2017) , that AMD did not yet offer a working Fedora driver, maybe this is why. Unless i can find the source code of such a driver, right now, I dont know, how to continue. Maybe I have to stay with the VNC solution, until I find a better way. Thank you so far ... PS. Glad I found out, curv is working over VNC by accident.

doug-moen commented 6 years ago

I googled this, and yeah, installing AMDGPU-PRO on Fedora-24 is way too much work. You have to downgrade your kernel and X11, it seems.

doug-moen commented 6 years ago

The Mesa bug I linked to was fixed last year in Mesa 17.3. You reported running Mesa 12.0.3, which must be quite old. After installing xorg-x11-drv-amdgpu, try running curv --version again and tell me what the output is.

Fedora-24 is very old (June 2016) and reached end-of-life last year. If you can't run the latest AMDGPU-PRO, then running the latest Mesa might help. There is a xorg-x11-drv-amdgpu-18.0.1-1 package for Fedora 28, looks like it contains Mesa 18.0.1.

gsohler commented 6 years ago

Hi Dough,

Thank you for your valuable inputs. Downgrading my x11 and my kernel is too risky for me and not worth improving my curv performance. I think i will rather upgrade to the latest Fedora 28, instead.

Cheers Günther

gsohler commented 6 years ago

After downloading dozens of Gigabytes from the internet, I finally arrived @ Fedora release 28 (Twenty Eight). This is what /etc/fedora-release tells me Then I made sure, that i have the xorg-x11-drv-amdgpu.x86_64 package installed. However, there is no PRO package. Having that i recompiled it from scratch and now curv tells me curv --version Curv: 0.1-226-g12dcafe GPU: X.Org AMD RS780 (DRM 2.50.0 / 4.17.9-200.fc28.x86_64, LLVM 6.0.0) OpenGL: 3.0 Mesa 18.0.5

with that, curv still only works with VNC @ 1FPS without i still get the error:

EE r600_shader.c:3933 r600_shader_from_tgsi - GPR limit exceeded - shader requires 133 registers EE r600_shader.c:183 r600_pipe_shader_create - translation from TGSI failed ! EE r600_state_common.c:872 r600_shader_select - Failed to build shader variant (type=1) -12

It appears that the number of registers used is dependant on the design. When reducing my design, it works locally displaying "6FPS". However this cannot be true. Trying to be objective its still less than 1 FPS and the mouse does not even move smoothly anymore while the program is run

What could be wrong in my place ?

doug-moen commented 6 years ago

A few initial comments.

  1. It's a new error message. "GPR limit exceeded - shader requires 133 registers" wasn't being shown before. This has been reported as a Mesa bug: https://bugs.freedesktop.org/show_bug.cgi?id=105371 It's a bug in the register spilling code in the GLSL compiler. A fix for this bug was submitted in February (see link), but I guess you would need to download the source and build your own version of Mesa.
  2. I'm running Ubuntu 16.04 LTS, with an Nvidia GTX-1050 GPU, using the Nvidia proprietary driver. It's an entry level, budget GPU card, but I get 60 FPS for your model. When Curv uses too many registers, rendering slows down, but it hasn't failed to compile with this driver. (Register spilling works correctly in the Nvidia driver.)

I read up on the difference between the Mesa and AMDGPU-PRO drivers.

You are running Fedora, which is a "bleeding edge" distro. The advantage is you get to use newer versions of packages. The disadvantage is a higher risk of things not working, and having to deal with that when things break. For example, you have a bad GPU driver. I personally run Ubuntu LTS, which has a strong emphasis on making everything work, with the disadvantage that packages are old. I either live with the older packages, or "side-load" up to date software and install it in /usr/local.

I think you have four options:

doug-moen commented 6 years ago

I can't guarantee that the patch for Mesa bug #105371 will fix all of the problems that break Curv.

I looked at the bug fix, and it is not AMD specific. In principle, I should be able to reproduce the problem on my machine using the Nvidia Nouveau driver, which also uses Mesa. (In practice, the Nouveau driver doesn't work, I just get a black screen. I need to upgrade my Nouveau driver before I can make progress on reproduction...)

My ideal solution is for the Mesa project to fix all of the bugs that break Curv. I'm going to see if I can figure out how to install the latest GPU driver from mesa3d.org, without relying on Ubuntu repositories. Then maybe I can start filing bug reports against Mesa.

gsohler commented 6 years ago

Hi doug! 1st off all I need to tell you, that i'm impressed by your efforts. Appearently i am not as fast with trying than your suggestions.

As far as your previuos posts, I decided to recompile and patch the MESA driver. I could do this with a 'git clone' and a 'git am' with the patches in mbox file format.

I could not yet find out, where curv includes the MESA driver from. 1st installed them in /usr/local, now trying trying with /usr

... still trying ...

gsohler commented 6 years ago

It appears you have temporarily changed your focus and you are looking into the driver issue rather than implementing new feature which is more attractive. If you feel its useful to test with my hardware, there might be options. In case you are interested, contact me at mail (at ) guenther-sohler.net

Just managed to get curv use the compiled mesa,

curv -version now shows: Curv: 0.1-226-g12dcafe GPU: X.Org AMD RS780 (DRM 2.50.0 / 4.17.9-200.fc28.x86_64, LLVM 5.0.1) OpenGL: 3.0 Mesa 18.2.0-devel (git-a18be3dbc1)

when i try running curv with my capsule wireframe icosahedron, it outputs:

EE r600_shader.c:183 r600_pipe_shader_create - translation from TGSI failed ! EE r600_state_common.c:875 r600_shader_select - Failed to build shader variant (type=1) -12

Now there is no error about too many variables, but the mouse cursor becomes quite unresponsive and there is nothing to see ...

gsohler commented 6 years ago

This weekend I had time to install my NVideo GTX 1050 Graphics card into my computer and connecting my display to the new card ultimately displays something useful. Next step is compile run the NVidia installer in my linux box. It compiles some of their modules against my linux kernel, which appearently fails. need to look how to proceed.

Guenther

doug-moen commented 6 years ago

Thanks for the update. I don't run Fedora, and I had the impression that installing the nvidia drivers was easier than that. Let me know what happens.

On 20 August 2018 at 09:26, gsohler notifications@github.com wrote:

This weekend I had time to install my NVideo GTX 1050 Graphics card into my computer and connecting my display to the new card ultimately displays something useful. Next step is compile run the NVidia installer in my linux box. It compiles some of their modules against my linux kernel, which appearently fails. need to look how to proceed.

Guenther

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/doug-moen/curv/issues/30#issuecomment-414315828, or mute the thread https://github.com/notifications/unsubscribe-auth/AFB6oeHVkyfWCuoEjWY9hBDq7AM6Io3Fks5uSrkOgaJpZM4Vif5j .

gsohler commented 6 years ago

i cant comment on this anymore, as i got a good nvidia card now. thus i can close :)

doug-moen commented 5 years ago

This problem seems to be resolved. Curv 0.4 works with the AMD Mesa 19.0.2 driver, according to a report from @ivocavalcante. Here's the relevant output from curv --version for the version that works:

Curv: 0.4
Compiler: gcc 7.4.0
Kernel: Linux 5.0.0-23-generic x86_64
GPU: X.Org, AMD VERDE (DRM 2.50.0, 5.0.0-23-generic, LLVM 8.0.0)
OpenGL: 4.5 (Compatibility Profile) Mesa 19.0.2

This bug was originally reported for Mesa 18.x.