Open starks opened 12 years ago
Yes, it should be possible to take general design of primus and implement VDPAU offloading. I did not look at VDPAU API yet, but I expect that it's simpler than GLX, so the implementation might be even simpler (but there's a catch that VDPAU is not available on the intel side, so you'd need to translate it to another API such as Xv or VA-API). I'm not particularly interested in doing such work, because powering on the nVidia card, reading back the decoded frames and displaying them on Intel GPU is very likely to burn more power than decoding on Intel (plus, Intel chips that come in Optimus laptops have decent support for hardware video decoding via va-api).
For reference, a vdpau/libva bridging library capable of providing hardware-accelerated video decoding on Intel chips for vdpau-aware applications is available at https://github.com/i-rinat/libvdpau-va-gl
@amonakov I know this is an old issue (and marked as wontfix) but it would be really nice if at least applications that I run through primusrun
were able to make use of nvidia hardware video decoding. I was able to get certain applications like vdpauinfo and nvidia-settings to see the correct vdpau libraries by adding the following to my bumblebee.conf
at the end of the LibraryPath
variable in the [driver-nvidia]
section:
/usr/lib/nvidia-346/vdpau:/usr/lib32/nvidia-346/vdpau
Of course they still need to be told which display to use (:8 in my case). I think this feature is not that far off from being supported and it would be nice to not have my nvidia cards video decoder sitting there useless to me.
I think to properly make your case you need to explain why using Intel hardware video decoding is not sufficient.
My issue isn't that intel video decoding is problematic, though I suspect there are things the nvidia decoder can do that the intel decoder can't (and for the record my intel hardware decoding isn't working properly) I think steam in-home streaming for example was claiming that using nvidia hardware encoding would cause decoding failures unless nvidia hardware decoding was used on the client, but I'm not an expert in these kinds of technologies so I can't really comment on that aspect.
Mainly I paid for the hardware and I want to take full advantage of it in the way that I see fit. It stands out as the last major feature that the linux optimus implementation is missing, and if windows is able to do it then we should be able to do it on linux as well. Whether or not intel can do it is not really relevant to me, I would like to see linux being able to do everything the other OSs can do and more (I think that's a general sentiment but I might be wrong).
I also want to point out that should you decide to implement this or even just investigate the effort, I'm happy to help in any way but like I said I'm not exactly an expert on these kinds of technologies so it may be hard for me to do any coding (I'm happy to take direction though, e.g. if you say 'add this function here and test it' with some specificity I might be able to work that out).
Should I take your silence to mean you have no interest in this feature?
Hope the last few months have gone well for you. Do you think you could comment any further thoughts on this issue? I would love for you to address the points I made in my comment back in May. It seems that primus is extremely stable at this point and this would be a great feature to add.
Thanks for your patience and reminder, and sorry for dropping this. It seems we are thinking about the issue from completely different angles. As I understood, you want extensions to software to make use of your hardware in new interesting ways. I'm not motivated to do that kind of work.
I have a very rough idea of development effort it would require. I'll try to explain a bit.
If an application uses VDPAU not only for decoding, but also for presentation, things get messy: since there's no VDPAU natively on Intel, does the interposer re-interpret it in terms of VAAPI (or GL?), or rely on a second interposer implementing VDPAU on top of VAAPI? Both ways are inconvenient.
If an application uses VDPAU only for decoding, and relies on GLX-VDPAU interop for presentation, things get... maybe a bit easier, but probably not by much. I haven't looked in detail.
In the past I've witnessed application developers being unhappy when an API they use (VDPAU) is not provided by a native library, but by an interposer trying to implement it on top of a different API (like vdpau-va-gl project linked above). That is understandable, and I wouldn't like to contribute to that kind of unhappiness.
Also, if something is missing on Intel hwdec side of things, I'd prefer if community of users asked Intel for improvements; you explained it doesn't have to do with your request, but still if a vdpau interposer did its job well, it would detract from that (users in general would have fewer reasons to ask for improvements, if some of them were satisfied with translated vdpau). I wouldn't like that.
So yes, for various reasons I'm not motivated to provide what you asked for. When I made primus, it felt fun, but I don't have a similar expectation for vdpau.
Yes you have definitely understood my perspective
I think it's very likely that I don't understand all the intricacies of vdpau. What I'd like to see is some way to decode with vdpau and pass the decoded frame off to the intel card to be displayed in the application window. If I understand what you wrote, that may not be possible because of how vdpau works? My basic thought was a direct GPU to GPU memory transfer of the frame but again I'm speculating a lot here as this isn't really my element.
As far as the ideological concern, I get that too, although I probably dont understand if there are any technical limitations for providing vdpau on an intel card (though still I would like to use the nvidia hardware for it anyway so that my intel chip doesnt need to work as hard). Maybe a companion library to primus could be constructed specifically to handle video so that primus itself, and be extension yourself, wouldn't be seen as interrupting progress on intel vdpau. From my searching around it looks like there are a number of people interested in this feature so I think it would be useful.
And finally as far as doing the work itself I'd love to do it but again this isn't really my field. That being said if you could point me in the direction of any resources related to it I could take a crack at it and I'd love to learn (I'm a researcher in machine learning so I do have math and programming experience not that it counts for much in this scenario as there may not be much overlap).
Thanks a lot for your response
@Queuecumber Well I think that this is out of scope for primus, what you ask for is a really different way of doing things that is more PRIME-like than primus-like I think.
So, this is probably not the place to discuss this, and it is probably better to start a discussion on some libva or mesa ML to see if some developers are interested or can guide you on how to achieve this.
I'm finding that Primus is too similar to VirtualGL.
That is, they share the same flaw. The transport method only recognizes GLX calls.
VDPAU will not work unless the app window that requires it is rendered on the Nvidia X server before being displayed on the Intel X server.
The solution is buffer sharing such as dma_buf, but this is currently not possible with the proprietary driver.
What does exist are CPU-bound solutions such as hybrid-windump provide a crude way to achieve VDPAU, but the windowing is really awful.
Would it be possible to process calls for libvdpau using a similar method?