KhronosGroup / Vulkan-Loader

Vulkan Loader
https://vulkan.lunarg.com/doc/sdk/latest/linux/LoaderInterfaceArchitecture.html
Other
502 stars 273 forks source link

Extend layer/driver JSON manifest to list library dependencies as a hint #1316

Open smcv opened 11 months ago

smcv commented 11 months ago

What enhancement are you suggesting for the Vulkan Loader? Please describe in detail.

Some frameworks need to use Linux namespaces to run Vulkan programs in a container or sandbox:

It's not always straightforward to know what is considered to be part of the driver. When enumerating Vulkan drivers and layers, we know that we need the library_path. However, the library_path can have dependencies, either by ordinary dynamic linking (ELF DT_NEEDED on Linux, which we can discover programmatically by parsing ELF headers) or dynamically at runtime (dlopen() on Linux, which we cannot discover programmatically - currently the only way to know what is needed is to load the driver and let it run its arbitrary code).

For Mesa, it's enough to load the Vulkan driver via its library_path and then follow the DT_NEEDED tree; but the Nvidia proprietary driver uses dlopen() to load parts of itself, so following the DT_NEEDED tree is not necessarily sufficient. As a result, the Nvidia team have been in contact with Chrome and pressure-vessel developers about providing and parsing a manifest that would tell those tools what other libraries are needed.

It occurs to me that for Vulkan and other driver-loaders that mimic its structure (like GLVND EGL) we already have a perfectly good manifest that describes the driver, so it might make sense to put library information into the Vulkan driver's JSON manifest instead of inventing a separate file?

A straw-man example:

{
    "file_format_version" : "1.0.0",
    "ICD": {
        "library_path": "libGLX_nvidia.so.0",
        "api_version" : "1.3.242",
        "wants_libraries": [
            "libnvidia-cfg.so.1",
            "libnvidia-glcore.so.535.113.01",
            "..."
        ]
    }
}

The spec for wants_libraries could perhaps be something like this:

"wants_libraries"

An array of library names that the library_path might load, either via normal dependency loading such as Windows DLL dependencies or ELF DT_NEEDED, or via runtime dynamic loading such as Windows LoadLibrary or Linux dlopen. Sandboxing and container frameworks can use this as a hint to make those libraries available in the sandbox or container whenever this driver is in use. The syntax of each library name in the array is the same as library_path: a plain filename with no directory separators is to be looked up in the system's shared object search path, a relative path is relative to the JSON manifest, and an absolute path is loaded directly. Libraries in this array are not necessarily mandatory dependencies, so if not all of them can be found, loaders and container frameworks should attempt to proceed with the subset of libraries that can be found. This field is optional.

Is this specific to a single platform? I'm personally only interested in this for Linux, but it seems equally applicable to other Unix platforms like *BSD and Hurd, and it doesn't seem as though there's any reason this couldn't be generalized to macOS and Windows too.

Additional context

cc @cubanismo - does this seem like a reasonable solution?

charles-lunarg commented 11 months ago

A couple of questions:

A clarification about driver manifests is that 'relative paths' are relative to the manifest file, so any use of relative paths that isn't relative to the manifest makes for a more confusing manifest. Not super important, but its the only 'quirk' I can think of happening.

juan-lunarg commented 11 months ago

How exactly would this new field be tested? wants_libraries could become out of date, wrong, etc. In which case you are back to square one again.

cubanismo commented 11 months ago

Dupllicating my comment on the GLVND instance of this issue, since I think this is the better venue to discuss the overall proposal:

Not strongly opposed to this, but given it's a spin-off of the Chrome/pressure-vessel discussion, I want to point out that the combination of this + the Vulkan version won't be sufficient to address what that proposal does. Besides GLX, it doesn't cover CUDA, DLSS, optix, etc. Perhaps these hints combined with a "the other stuff" json file would address the whole problem space, but I'm a little worried about decentralizing the data. E.g., if someone intents API dispatcher and adds hints to it too, do we then say "Look at the 'other stuff' json file + GLVND json + Vulkan json unless you find API dispatcher json files, then look at those + 'other stuff' json + GLVND json." Alternatively, the "other stuff" json file could be a superset with tags of some sort to note which API(s) each file relates to, but then why duplicate that data into the GLVND/Vulkan ICD json files? Just extra work at that point.

cubanismo commented 11 months ago

How exactly would this new field be tested? wants_libraries could become out of date, wrong, etc. In which case you are back to square one again.

I think you could have tests that run CTS or some other set of test applications, create a very bare container mapping the files specified in to the json into that container, run the same tests in the container and assert the results are the same/pass in both.

How does the various frameworks "know" where the manifest files are located? This may be a redundant question, as the framework may be the one giving the Vulkan-Loader the manifest & library. I ask because the loader looks for drivers in certain system paths based on environment variables, predefined system locations, and how it was compiled. In other words, where the loader looks for things is not an easy question to answer, so it leaks into the frameworks.

Yes, I think this is a good question, and why a top-level json file with its own ordained locations in the filesystem, as proposed in the references, may be an easier lift for container maintainers.

kbrenneman commented 10 months ago

A standard JSON structure in a single standard location might be the easiest, since that way container managers wouldn't need to separately scan manifests from API-specific directories.

As for distinguishing different sets of files, how much granularity do we need? Is something as broad as "graphics" and "compute" sufficient? Would we want to be able to select files based on specific APIs or features (e.g., egl, glx, Vulkan, DLSS, etc.)?

If we only need a couple broad categories like "graphics", then just having separate file lists in the JSON file, or even separate JSON files might be good enough. Any files that are required for both would just be listed in both, and whatever parses the JSON files would be responsible for filtering out duplicates. Something like:

{
    "graphics": [
        "libraries": [
            "libEGL_nvidia.so.0",
            "libGLX_nvidia.so.0"
        ],
        "data": [
            "/usr/share/glvnd/egl_vendor.d/10_nvidia.json",
            "/usr/share/vulkan/icd.d/nvidia_icd.json"
        ]
    ],
    "compute": [
        "libraries": [
            "libnvidia-opencl.so.1"
        ],
        "data": [
            "/etc/OpenCL/vendors/nvidia.icd"
        ]
    ]
}

If we expect to have a lot of categories, though, then having duplicate filenames like that could get pretty unwieldy. In that case, it might be easier to do it the other way around, with a single list of files and then a set of feature tags for each file:

{
    "libraries": [
        {
            "name": "libEGL_nvidia.so.0",
            "tags": ["egl"]
        },
        {
            "name": "libGLX_nvidia.so.0",
            "tags": ["egl", "vulkan"]
        },
        {
            "name": "libnvidia-opencl.so.1",
            "tags": ["compute"]
        }
    ],
    "data": [
        {
            "name": "/usr/share/glvnd/egl_vendor.d/10_nvidia.json",
            "tags": ["egl"]
        },
        {
            "name": "/usr/share/vulkan/icd.d/nvidia_icd.json",
            "tags": ["vulkan"]
        },
        {
            "name": "/etc/OpenCL/vendors/nvidia.icd",
            "tags": ["compute"]
        }
    ]
}
smcv commented 7 months ago

I can certainly see the argument for making this discovery be something that happens "above" Vulkan/EGL/OpenXR/etc., so that new dispatchers that are "the same shape" as Vulkan can take part in this mechanism even if they have nothing to do with Vulkan specifically.

Would the Vulkan-Loader make use of this list at all? IE, is there any action the loader could/should take in response to the presence of this list? I assume no, but its worth asking.

My intention was: no.

How does the various frameworks "know" where the manifest files are located? This may be a redundant question, as the framework may be the one giving the Vulkan-Loader the manifest & library. I ask because the loader looks for drivers in certain system paths based on environment variables, predefined system locations, and how it was compiled. In other words, where the loader looks for things is not an easy question to answer, so it leaks into the frameworks.

In at least pressure-vessel, we already need to know (and duplicate the knowledge of) how and where Vulkan, EGL, etc. loaders look for manifest files, because we already need to be able to:

So this would not be any additional burden for us. Similarly, I would expect that Chromium needs to find and parse the manifests, so that it can find the actual shared libraries, so that it can ensure that they get mirrored into its sandboxed namespace.

A clarification about driver manifests is that 'relative paths' are relative to the manifest file

Yes, that's why I suggested each item in wants_libraries should be interpreted in a way that is consistent with the library_path.

There is a difference between plain basenames that don't contain / (libGLX_nvidia.so.0) and relative paths that do contain / (./libGLX_nvidia.so.0). Plain basenames (or SONAMEs) are looked up in a system-specific search path (on Linux, it involves /etc/ld.so.cache, $LD_LIBRARY_PATH, ELF headers and some system-specific quirks, which are another thing that I have to "just know"). Relative paths are interpreted as being relative to the manifest (JSON file). This is quite similar to how Unix shells search PATH for commands if the command does not contain /, but interpret commands that do contain / as being relative to the current working directory.

any use of relative paths that isn't relative to the manifest makes for a more confusing manifest

Oh, I agree completely - that's why I suggested reusing the same interpretation as library_path.

charles-lunarg commented 6 months ago

My stance is that if the changes do not affect how the loader interprets the JSON file, then I have very to say about the changes. Adding additional fields to the manifest is not against the file description.

I'm happy to allow any/all discussion about additional fields in the manifests to occur here, I just wanted to clarify that I don't have a strong stake in these discussions beyond not desiring breaking back-compat.

If there is a strong desire to use a new format, that wouldn't be a decision I get to make unilaterally, as it would have to go through the Vulkan Working Group (specifically the SI subgroup) before any decisions are made. (Not that anyone in this discussion isn't aware of that fact, again I'm just clarifying my position).