Open anholt opened 8 years ago
This will probably be supported in DRM prime mode with Noralf's tinydrm work.
I tried to offload rendering to vc4 when using tinydrm, but I couldn't get it to work. Should it work?
$ sudo lightdm ^Z
# working fine
$ DISPLAY=:0 glxgears -fullscreen
$ DISPLAY=:0 glxinfo | grep "OpenGL renderer"
OpenGL renderer string: Software Rasterizer
$ DISPLAY=:0 xrandr --listproviders
Providers: number : 2
Provider 0: id: 0x65 cap: 0x2, Sink Output crtcs: 1 outputs: 1 associated providers: 0 name:modesetting
Provider 1: id: 0x43 cap: 0x2, Sink Output crtcs: 3 outputs: 1 associated providers: 0 name:modesetting
$ DISPLAY=:0 xrandr --setprovideroffloadsink 1 0
X Error of failed request: BadValue (integer parameter out of range for operation)
Major opcode of failed request: 140 (RANDR)
Minor opcode of failed request: 34 ()
Value in failed request: 0x43
Serial number of failed request: 16
Current serial number in output stream: 17
$ dmesg
[ 3286.999925] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:25:Virtual-1]
[ 3286.999965] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:25:Virtual-1] probed modes :
[ 3286.999985] [drm:drm_mode_debug_printmodeline] Modeline 29:"320x240" 0 1 320 320 320 320 240 240 240 240 0x48 0x0
[ 3287.000884] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:25:Virtual-1]
[ 3287.000904] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:25:Virtual-1] probed modes :
[ 3287.000921] [drm:drm_mode_debug_printmodeline] Modeline 29:"320x240" 0 1 320 320 320 320 240 240 240 240 0x48 0x0
Thanks for looking into this! Here's my output, without tinydrm:
# xrandr --listproviders
Providers: number : 1
Provider 0: id: 0x43 cap: 0xf, Source Output, Sink Output, Source Offload, Sink Offload crtcs: 3 outputs: 1 associated providers: 0 name:modesetting
So you don't have export or offload listed. glamor acceleration is required for PRIME, which your Xorg.log said didn't get initialized. Do you maybe not have the userspace side of the driver installed?
I have now tried the latest Raspian 2017-01-11, same result. Then I followed your guide: https://github.com/anholt/mesa/wiki/VC4-complete-Raspbian-upgrade
$ DISPLAY=:0 xrandr --listproviders
XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":0"
after 12 requests (12 known processed) with 0 events remaining.
This is the same error that I got on 1.16.4. Upgrading xserver-xorg to 1.18.4 gave me the result in my previous post.
If you don't see any obvious solution to this, I'll just pick it up again in a few months when it has matured some more.
Hi Eric, I watched your XDC talk yesterday and was reminded of this issue.
Using 2017-09-07-raspbian-stretch I'm able to get further: (I've backported tinydrm to rpi-4.9.y)
$ DISPLAY=:0 xrandr --setprovideroutputsource 1 0
$ DISPLAY=:0 xrandr --listproviders
Providers: number : 2
Provider 0: id: 0x45 cap: 0xf, Source Output, Sink Output, Source Offload, Sink Offload crtcs: 3 outputs: 3 associated providers: 1 name:modesetting
Provider 1: id: 0x28b cap: 0x2, Sink Output crtcs: 1 outputs: 1 associated providers: 1 name:modesetting
$ DISPLAY=:0 xrandr
Screen 0: minimum 320 x 200, current 1920 x 1080, maximum 2048 x 2048
HDMI-1 connected primary 1920x1080+0+0 (normal left inverted right x axis y axis) 521mm x 293mm
1920x1080 60.00*+ 50.00 59.94
1680x1050 59.88
1600x900 60.00
1280x1024 75.02 60.02
1440x900 59.90
1280x800 59.91
1152x864 75.00
1280x720 60.00 50.00 59.94
1024x768 75.03 70.07 60.00
832x624 74.55
800x600 72.19 75.00 60.32 56.25
720x576 50.00
720x480 60.00 59.94
640x480 75.00 72.81 66.67 60.00 59.94
720x400 70.08
Composite-1 unknown connection (normal left inverted right x axis y axis)
720x480 62.69
DSI-1 disconnected (normal left inverted right x axis y axis)
Virtual-1-1 connected (normal left inverted right x axis y axis)
320x240 0.01 +
$ sudo sh -c "echo 0x1f > /sys/module/drm/parameters/debug"
$ sudo dmesg -C
$ DISPLAY=:0 xrandr --output Virtual-1-1 --auto
I get the desktop on the display, but it doesn't refresh.
tinydrm flushes the framebuffer when the pipe is enabled, on PAGE_FLIP and on DIRTYFB.
Here's the kernel log from the above commands: https://gist.github.com/notro/fb8ec4418244ad2af8bd5fbba67434b6 It starts with a few DIRTYFB and then it stops. I've added linebreaks around the flushing and it's triggers.
Any idea why the flushing stops?
Edit: Display was off, I moved the mouse and it turned on with updated content. This wasn't suprising in itself since it flushes on enable, but once again there was a few DIRTYFB IOCTL's: https://gist.github.com/notro/8a1bdd09b6cc15217bb4a86f6282a2c9
The Xorg.log would probably be useful here, to see what it says about dirty tracking
I finally got it to work without knowing why. xrandr kept giving me errors and it resulted in the desktop switching from HDMI to TFT. When I tried today it suddenly worked. Here's the log: https://gist.github.com/notro/cf8006ce379d134808672e2bb61ea9cf
I blacklist the driver mi0283qt so the desktop shows up on HDMI and not TFT. X configuration is a nightmare, even more so now that it's automatic, there's no config to adjust... So I load the driver after boot. Here's the whole ssh session: https://gist.github.com/notro/4d22e75a694a33496c3ff01e95e2e572
Thank you for all your work on this, anholt and notro, I really appreciate what you guys are doing. I'm wrestling with this issue as well. Starting an open source audio/visual synthesizer project with both HDMI out for visual synthesis and a control interface on a riverdi touch FT801-based SPI TFT. Running on a CM3 compute module, pretty much maxing out all the peripherals with additional 2x SPIs for ADC, DAC, plus i2s audio codec, plus bluetooth over uart and wifi over sdio. Rest of GPIO I used for modular synth gate inputs/outputs. Pretty much just have the SPI left for the touchscreen. DSI would have been nice but nobody seems to make a DSI touch panel in my size (3.5"). So I wrote a tinydrm driver based around notro's work (attached: rvtft801.txt dts.txt)
Started with downstream arm 32-bit and jessie, had it mostly working except for this issue of offloading GPU via prime buffers. But with swrast I was getting 40fps glxgears which is about all I can expect out of the SPI (until I add compression which the FT801 supports but will use more rpi cpu).
Updated to stretch 32-bit, all of a sudden frame rates drop in half (and this still get this bug). Turns out stretch is shipping the gallium llvmpipe software rasterizer which I'm guessing is in no way intended to perform well on a 32-bit arm (especially without asimd, etc). I compiled mesa manually using anholt's instructions and got back dri swrast with the right configure options, but still had this bug. I hadn't tried in the past few days, maybe it's fixed now with what notro is saying.
Anyway, tried going upstream to arm64 with stretch (and then also to arm64 arch linux to be able to get the latest mesa/xserver nicely packaged to me) to see if this works better with gallium vc4/llvmpipe, ported rest of my peripherals over, only to realize that my tinydrm driver no longer works because of the arm64 zone dma situation. dma_coerce_mask_and_coherent returns -ENOMEM, and if I continue anyway, the drm gem fbdev helper tells me it "failed to allocate buffer with size 155648", "Failed to set initial hw configuration.", and tinydrm register tells me "Failed to initialize fbdev: -12"
Happy to code and contribute whatever needs to be fixed myself, but where to go from here? Do I add the dma zone allocation to the spi driver (which is already using 2 dmas for tx/rx), or to a framebuffer driver (but looks like framebuffer is set to be deprecated upstream? https://elinux.org/RPi_Upstreaming). DMA can't be disabled, right? I would think communication between the GPU and ARM is necessarily DMA.
Seems like this all this migration upstream both to the newer drm platform and to arm64, while clearly better architecturally, leaves a bit of a major regression hole for SPI displays. Is there a plan for them, or were they just going to be accidentally marked deprecated? Would like to help make sure this doesn't happen.
Looks like a bunch of people were complaining on this thread about slower llvmpipe in stretch (https://www.raspberrypi.org/forums/viewtopic.php?t=191791), not sure if they realized it was because the gallium llvmpipe software rast is meant for x86 and llvm on arm 32-bit is not going to be good at all, but it turned childish and someone closed the thread.
spi devices are not dma devices, it's the spi controller device that does dma. When I made tinydrm I used the cma helper because it was easy to use and I had little understanding of drm. This meant I had to make it look like the spi device could do dma by using dma_coerce_mask_and_coherent(). Now dma_alloc_wc() would succeed. The dma address returned was useless, but that didn't matter because the spi core used the streaming api anyway to get a new address (spi_map_buf()). Why it fails on arm64, I don't know. Never tried it.
I'm currently in the process of moving away from the cma helper: https://lists.freedesktop.org/archives/dri-devel/2017-October/154048.html
amazing, thanks notro... when I apply your patches, and skip the dma_coerce_mask_and_coherent call, tinydrm is now working on arm64 patched against latest rpi-4.14.y / bcmrpi3_defconfig. tried it on a pitft+ 2.8" (using your mi0283qt driver) with and without your patches to confirm that's what fixed it. dma_coerce_mask_and_coherent still fails with -5 (as I think expected) but apparently we don't need it anymore.
sadly no framerate improvement with arm64 using gallium llvmpipe, it uses less CPU but I think it may be doing full screen updates and maxing out the SPI, unlike the good old dri swrast days.
but gpu offloading seems close! I don't get any more segfaults, "xrandr --setprovideroutputsource 1 0" returns successfully now. I just get "xrandr: Configure crtc 3 failed" from "xrandr --output Virtual-1-1 --auto". Xorg log shows "failed to add fb -22". might be down to xorg configuration at this point (god help us).
one question- will these prime buffers allow me to (in theory at least) render a GPU-accelerated 1920x1080 opengl canvas on HDMI, plus a GPU-accelerated 320x240 gui interface on tinydrm? Or is this strictly just going to work for mirroring?
Took another crack at this, got a little closer this time. FYI for anyone trying to get drm prime working with vc4, your life will get a lot easier if you write a small mesa gallium userspace driver instead of trying to convince the mesa loader to do what you want. Just copy/paste this one for the pl111 and swap in the name of your driver. Then have fun inserting it into the mesa source tree, build, and you should end up with an xxx_dri.so in your mesa dri drivers folder. With that, X seems to do the right thing without much pulling your hair out. We might have to use renderonly_create_kms_dumb_buffer_for_resource (whew) instead of renderonly_create_gpu_import_for_resource for tinydrm drivers. Was thinking I'd try to submit a generic tinydrm version upstream to mesa with a fallback to llvmpipe if vc4 isn't available, but since tinydrm drivers all have different names, I'm not sure how mesa would bind the right one because the matching seems solely based off the name of the spi driver. And it's a shame you can't build out-of-tree gallium drivers or at least I haven't found the headers required to.
I have a feeling this will work for 32-bit kernels. For me on 64-bit, this works up until the mesa code calls drmPrimeFDToHandle and ioctls the tinydrm (kernel) driver with DRM_IOCTL_PRIME_FD_TO_HANDLE. From there:
The comment in the source code at that point is: /*
The (32-bit) arm code, on the other hand passes back the proper iommu api ops. So maybe the bcm2835-dma driver is supposed to be doing something different on arm64 than it does on arm? Or am I just compiling the kernel with the wrong options? It's weird because other devices seem to dma fine and i get boot messages from the dma api saying everything is fine. I'm guessing this is the same reason I couldn't attach my tinydrm driver to the CMA gem helper but was able to with notro's new vmalloc bo patches on arm64
Did you tried mainline kernel and its defconfig?
there is no arm64 mainline defconfig cm3 dts. a modified rpi3 dts plus some love gets you booting (if i remember, i needed to make a new dt-blob.bin and jump through some u-boot hoops). even though it's the same processor as the rpi 3, handful of things are different in the peripheral wiring and different gpio expanders etc.
I winded up basing off raspberrypi/4.14.y, made a custom dts, and i do merges from there first and then from mainline to get better arm64 drivers, and now also merging in drm-tip and hand patching in notro's work from dri-devel mailing list so I can get this kernel driver working on arm64. Adding 3 other kernel modules on top of that, one of which is an alsa-soc codec, and I think I need the legacy 2709-dma driver for that because if it's circular abilities or whatever, but it seems to work in cooperation with the bcm2835 dma driver fine and I'm doing SPI dma just fine with this same display driver. I'm too far deep in this thing to look back, so forging ahead, but hoping to meet up with you guys somewhere in the mainline arm64 bcm2837 4.14 LTS promised land. That probably means I'm on my own too unless this is broken for anyone else.
It's possible i have something wrong around the IOMMU / CMA / DMA coherency bits of the kernel config or my dts needs some kind of iommu entry on arm64, but i dont see any references to this stuff in mainline or rpi3 dts files. I haven't fully wrapped my head around it and maybe there's a bad combo mixing mainline and downstream drivers. but i've scoured and i don't see any evidence of anyone getting an spi display working on arm64 mainline. i've tried probably 10 different reasonable-looking config permutations, i was ratelimited by the fact that DMA config changes trigger a full kernel rebuild. but i just got a jetson tx2 and nvme ssd, now I'm smoking through arm64 native kernel builds in like 10 mins. I'm coming back at this soon, for now I think using vintage mesa sw rast is better for me because it sends partial screen updates and seems like even if we get prime working it's gonna be a full scanout (correct if i'm wrong) and our bottleneck is spi clock. though that will probably change for me when I add compression.
seems like even if we get prime working it's gonna be a full scanout
This depends on usespace which I know hardly anything about. tinydrm does a full scanout on page flips, so if userspace does page flips and dirtyfb ioctls, we're in a bad shape. My brief testing with X and prime indicated that this might be the case. Anyway I have to make sure that tinydrm doesn't scanout page flips if dirtyfb has been used. I haven't looked into this because there was some talk about doing dirtyfb through atomic modesetting (which handles page flips), so I'm waiting to see if something comes out that.
I tried this out on rpi-4.14.y 32-bit without any patches and indeed it worked, quite well. You just have to make the tinydrm gallium driver as I mentioned above. Every app I tried works much better on the spi display with gpu rendering. To answer my own questions, yes you can render on HDMI and tinydrm display at the same time. And the rendering is done with partial updates both in X and wayland (though not tested in raspbian, this is using fedora rawhide), and there is way less spi traffic and frame rates can get really high. All the xrandr prime commands seem to work. Thanks again for everyone's work! Shortcut method for building your own gallium driver:
unpack or git clone mesa source, then run these commands, substituting ft8xx with the name of your tinydrm driver (the name you use with modprobe):
find . -type f -print0 | xargs -0 sed -i 's/pl111/ft8xx/g' find . -type f -print0 | xargs -0 sed -i 's/PL111/FT8XX/g' mv src/gallium/drivers/pl111 src/gallium/drivers/ft8xx mv src/gallium/winsys/pl111 src/gallium/winsys/ft8xx mv src/gallium/winsys/ft8xx/drm/pl111_drm_public.h src/gallium/winsys/ft8xx/drm/ft8xx_drm_public.h mv src/gallium/winsys/ft8xx/drm/pl111_drm_winsys.c src/gallium/winsys/ft8xx/drm/ft8xx_drm_winsys.c
configure making sure you add --with-gallium-drivers=vc4,ft8xx
make
Progress on this: hx8357d tinydrm driver merged, kmsro patch series submitted.
These are going to be tricker than the other panels: We need to use the transposer to write back into memory, then when the transposer is done we need to use the DMA engine to stream the new frame over SPI to the device. I hope.