gfx-rs / portability

Vulkan Portability Implementation
Mozilla Public License 2.0
384 stars 25 forks source link

[WIP] Makes vulkan backend work on linux. #111

Closed goddessfreya closed 6 years ago

goddessfreya commented 6 years ago

Depends on: https://github.com/gfx-rs/gfx/pull/2221 Someone should check it doesn't break non-linux builds. Will add documentation soon. Signed-off-by: Hal Gentz zegentzy@protonmail.com

goddessfreya commented 6 years ago

@kvark I've added some stuff to README.md and changes the makefile before I go on my two day hiatus.

Additionally, I think updating https://github.com/gfx-rs/portability/wiki/Vulkan-CTS-status might be necessary when I get back.

goddessfreya commented 6 years ago

While this method does work with most applications, as they use dlopen + dlsym, those who link to the system's shared library won't work.

The solution is "simple", we got to compile an empty stub library, link portability to that library, then remove the empty stub library and replace it to a symlink to the system's vulkan library. The rational behind this method is explained here: https://github.com/amonakov/on-wrapping/blob/master/interposers-discussion.asciidoc#exported-symbols

Updated https://github.com/ZeGentzy/Portapotty to show what I mean.

goddessfreya commented 6 years ago

Problem solved by using RTLD_NEXT.

EDIT: I want to do one more pass over the readme changes, just to make sure my understanding is correct.

kvark commented 6 years ago

@ZeGentzy are you still working on it? Would be great to land this (with green CI)

goddessfreya commented 6 years ago

So, when writing the readme, something felt wrong. So I examined the results from ldd -v and I noticed something didn't add up. Eventually, after some logging, I discovered my new method was causing the programs to just call the system library and skip over portability.

I reverted to my old method but then I realized the same thing had been actually happening there too. (I guess this is why you should always log stuff :laughing: ) After some minor tweaking I finally got it so that: 1- The apps call portability 2- Portability calls the vulkan backend. 3- The vulkan backend calls ash. 4- Ash query's the system's vkGetInstanceProcAddr (which it gets using RTLD_NEXT) for the function it wants to call, and then calls the function.

Issue is, now the system's vkInstanceProcAddr is returning our functions!

Here is the log: (from running LD_PRELOAD=/home/gentz/Documents/gfx/portability/target/debug/libvulkan.so.1:/usr/lib/libvulkan.so.1 vulkaninfo)

VKCR Some(0x7f5c424ad7d0)
VKADDR Some(0x7f5c424ae5f0)
VKEIEP Some(0x7f5c424b18f0)
VKEILP Some(0x7f5c424af0a0)
GFXCR Some(0x7f5c4252da00)
GFXADDR Some(0x7f5c42536220)
GFXEIEP Some(0x7f5c4253dd70)
GFXEILP Some(0x7f5c4253cdd0)
MY ADDR GENTZ ===============================================
GENTZ SAYS PROC ADDR GFX
MY ADDR POST (Handle(0x0), "vkEnumerateInstanceVersion") -> None ==========================
GENTZ SAYS =====================================
GENTZ SAYS CI GFX
SPECIAL GENTZ NEXT "vkGetInstanceProcAddr"
SPECIAL GOT 0x7f5c421f6490
LOAD GENTZ F "vkGetInstanceProcAddr" -> 0x7f5c421f6490
GENTZ CALLING FN Some(0x7f5c421f6490)
GENTZ SAY ENTRY FN V1 "vkCreateInstance" -> 0x7f5c424ad7d0
LOAD GENTZ F "vkCreateInstance" -> 0x7f5c424ad7d0
GENTZ CALLING FN Some(0x7f5c421f6490)
GENTZ SAY ENTRY FN V1 "vkEnumerateInstanceExtensionProperties" -> 0x7f5c424b18f0
LOAD GENTZ F "vkEnumerateInstanceExtensionProperties" -> 0x7f5c424b18f0
GENTZ CALLING FN Some(0x7f5c421f6490)
GENTZ SAY ENTRY FN V1 "vkEnumerateInstanceLayerProperties" -> 0x7f5c424af0a0
LOAD GENTZ F "vkEnumerateInstanceLayerProperties" -> 0x7f5c424af0a0
GENTZ CALLING FN Some(0x7f5c424b18f0)
GENTZ CALLING FN Some(0x7f5c424b18f0)
GENTZ CALLING FN Some(0x7f5c424af0a0)
GENTZ CALLING FN Some(0x7f5c424af0a0)
GENTZ CALLING FN Some(0x7f5c424ad7d0)
GENTZ SAYS =====================================
GENTZ SAYS CI GFX
GENTZ CALLING FN Some(0x7f5c424b18f0)
GENTZ CALLING FN Some(0x7f5c424b18f0)
GENTZ CALLING FN Some(0x7f5c424af0a0)
GENTZ CALLING FN Some(0x7f5c424af0a0)
GENTZ CALLING FN Some(0x7f5c424ad7d0)
GENTZ SAYS =====================================
GENTZ SAYS CI GFX

Let's break it down:

VKCR Some(0x7f5c424ad7d0)
VKADDR Some(0x7f5c424ae5f0)
VKEIEP Some(0x7f5c424b18f0)
VKEILP Some(0x7f5c424af0a0)
GFXCR Some(0x7f5c4252da00)
GFXADDR Some(0x7f5c42536220)
GFXEIEP Some(0x7f5c4253dd70)
GFXEILP Some(0x7f5c4253cdd0)

These are the addresses for our functions in libportability and libportability-gfx. VKCR -> vkCreateInstance VKADDR -> vkGetInstanceProcAddr VKEIEP -> vkEnumerateInstanceExtensionProperties VKEILP -> vkEnumerateInstanceLayerProperties GFXCR -> gfxCreateInstance GFXADDR -> gfxGetInstanceProcAddr GFXEIEP -> gfxEnumerateInstanceExtensionProperties GFXEILP -> gfxEnumerateInstanceLayerProperties

MY ADDR GENTZ ===============================================
GENTZ SAYS PROC ADDR GFX
MY ADDR POST (Handle(0x0), "vkEnumerateInstanceVersion") -> None ==========================

They called our vkGetInstanceProcAddr, which calls gfxGetInstanceProcAddr and returns none.

GENTZ SAYS =====================================
GENTZ SAYS CI GFX

They called our vkCreateInstance, which calls gfxCreateInstance.

SPECIAL GENTZ NEXT "vkGetInstanceProcAddr"
SPECIAL GOT 0x7f5c421f6490

This is coming from this code:

#[cfg(feature = "use-rtld-next")]
lazy_static! {
    // Entry function pointers
    pub static ref VK_ENTRY: Result<EntryCustom<V1_0, ()>, LoadingError>
        = EntryCustom::new_custom(
            || Ok(()),
            |_, name| unsafe {
                println!("SPECIAL GENTZ NEXT {:?}", name);
                let ret = DynamicLibrary::symbol_special(SpecialHandles::Next, &*name.to_string_lossy())
                    .unwrap_or(ptr::null_mut());
                println!("SPECIAL GOT {:?}", ret);
                ret
            }
        );
}

dlsys(RTLD_NEXT, "vkGetInstanceProcAddr") gave us 0x7f5c421f6490, 0x7f5c421f6490 isn't one of our addresses, so it has to be the system's libvulkan.so.1's vkGetInstanceProcAddr function. (

LOAD GENTZ F "vkCreateInstance" -> 0x7f5c424ad7d0

This is coming from this code:

macro_rules! vk_functions {
    ($struct_name: ident, $($raw_name: expr, $name: ident ($($param_name: ident: $param: ty),*,) -> $ret: ty;)+) => {
        #[allow(non_camel_case_types)]
        pub struct $struct_name{
            $(
                $name: extern "system" fn ($($param_name: $param),*) -> $ret,
            )+
        }

        impl Clone for $struct_name {
            fn clone(&self) -> Self{
                $struct_name {
                    $(
                        $name: self.$name,
                    )+
                }
            }
        }

        unsafe impl Send for $struct_name {}
        unsafe impl Sync for $struct_name {}

        impl $struct_name {
            pub fn load<F>(mut f: F) -> ::std::result::Result<$struct_name, Vec<&'static str>>
                where F: FnMut(&::std::ffi::CStr) -> *const c_void
            {
                use std::ffi::{CString};
                use std::mem;
                let mut err_str = Vec::new();
                let s = $struct_name {
                    $(
                        $name: unsafe {
                            let cname = CString::new($raw_name).unwrap();
                            let val = f(&cname);
                            println!("LOAD GENTZ F {:?} -> {:?}", cname, val);
                            if val.is_null(){
                                err_str.push(stringify!($raw_name));
                            }
                            mem::transmute(val)
                        },
                    )+
                };

                if err_str.is_empty() {
                    Ok(s)
                }
                else{
                    Err(err_str)
                }
            }
            $(
                #[inline]
                pub unsafe fn $name(&self $(, $param_name: $param)*) -> $ret {
                    let fp = self.$name;
pub type PFN_vkVoidFunction = ::std::option::Option<unsafe extern "C" fn()>;
                    println!("GENTZ CALLING FN {:?}", ::std::mem::transmute::<_, PFN_vkVoidFunction>(Some(*&fp)));
                    fp($($param_name),*)
                }
            )+
        }
    }
}

The address we got from dlsym got propagated upwards.

GENTZ CALLING FN Some(0x7f5c421f6490)
GENTZ SAY ENTRY FN V1 "vkEnumerateInstanceExtensionProperties" -> 0x7f5c424b18f0
LOAD GENTZ F "vkEnumerateInstanceExtensionProperties" -> 0x7f5c424b18f0

And

GENTZ CALLING FN Some(0x7f5c421f6490)
GENTZ SAY ENTRY FN V1 "vkEnumerateInstanceLayerProperties" -> 0x7f5c424af0a0
LOAD GENTZ F "vkEnumerateInstanceLayerProperties" -> 0x7f5c424af0a0

We called 0x7f5c421f6490 (the system's libvulkan.so.1's vkGetInstanceProcAddr function) with vkEnumerateInstance{Extension,Layer}Properties and got 0x7f5c424b18f0 and 0x7f5c424af0a0.

The former is equal to VKEIEP and the latter to VKEILP! Why my system is returning portability's addresses is beyond me. I'm going to have to recompile my system's mesa and stuff with debugging symbols tomorrow.

td;lr; It doesn't work because something my system is doing.

goddessfreya commented 6 years ago

We can blame my problem on this function:

static inline void *globalGetProcAddr(const char *name) {
    if (!name || name[0] != 'v' || name[1] != 'k') return NULL;

    name += 2;
    if (!strcmp(name, "CreateInstance")) return vkCreateInstance;
    if (!strcmp(name, "EnumerateInstanceExtensionProperties")) return vkEnumerateInstanceExtensionProperties;
    if (!strcmp(name, "EnumerateInstanceLayerProperties")) return vkEnumerateInstanceLayerProperties;
    if (!strcmp(name, "EnumerateInstanceVersion")) return vkEnumerateInstanceVersion;

    return NULL;
}

from loader/gpa_helper.h, part of the vulkan loader.

It's returning our vkCreateInstance and others, instead of the ones defined + implemented in loader/trampoline.c. Some sort of name clash I guess.

Edit:

In addition to the problems explained in the previous section, LD_PRELOAD has a flaw that makes it unusable with some executables: its ``blind'' interposition of symbols by name only (without regard to the originating module) breaks executables that have a globally visible symbol with the same name as one of the symbols in the interposed library. This was observed with Unigine Heaven benchmark with apitrace.

Edit: We can just do the same thing as apitrace: https://github.com/apitrace/apitrace/blob/master/docs/USAGE.markdown#linux

goddessfreya commented 6 years ago

apitrace also overrides dlsym and dlopen, something which I'd rather not do.

We could write a layer, but honestly this sounds like a lot of effort which would be only applicable to the vulkan backend. I think there are better time sinks than figuring out a way to make the vulkan backend work, which would basically do nothing.