SveSop / wine-nvoptix

Relay for nvoptix to use with Wine
MIT License
5 stars 3 forks source link

Deduplicate `OptixDeviceContextOptions` #5

Closed Saancreed closed 2 years ago

Saancreed commented 2 years ago

My initial implementation had it prefixed with _55 because I wasn't sure if this structure had the same layout in previous versions too, however, now I can see that it should be compatible with previous versions at least down to ABI version 36. Let's remove unnecessary copies by moving this struct and related callback type to shared nvoptix_types.h and remove version prefixes from all of them. We could reintroduce a prefix if we ever encounter a version that is incompatible with the current one.

SveSop commented 2 years ago

Yeah. It is not that much that changes between them. There is a change between 41 -> 47 on a couple of functions.. other than that it is just added functions, so it should be doable to squash this a bit :)

I downloaded some OptiX demo's from nVidia github, and was able to compile with 4 different ABI's to verify (hence, i found my misstake in 47 which led me to force-push hehe).

SveSop commented 2 years ago

I have been fiddling with this a bit today, but my very limited programming skillz really comes up short.

I see it would have been fairly easy if it had only been adding new functions to the tables like from eg. ABI_36 -> ABI_41, but the case is that in ABI_47 there are 2 existing functions in 36 and 41 that got updated. This kinda makes it hard (for me) to figure out how i can build the tables...

It is probably not that hard, but am i right in that i cannot really RENAME the const struct functioname to something it should not be, cos when whatever app calls this, it would fail? As i understand it now, the table is being built with the correct function names, and #define ASSIGN_FUNCPTR(f) *(void**)(&table->f) = (void*)&f ## _47 is just some sort of macro to link this to the different ABI functions... Like ASSIGN_FUNCPTR(optixGetErrorName); will point to optixGetErrorName_47 (for the ABI_47 table) internally only, but if i changed it to ASSIGN_FUNCPTR(optixGetErrorName_47); the app would fail the call because optixGetErrorName would not exist if i did that? Right? :smile:

Making the struct's in the header seemed doable, and i found it could perhaps work neatly enough when using -fms-extensions compile option, cos then i could create the struct like this (small example):

typedef struct OptixFunctionTable
{
    const char *(*optixGetErrorName)(OptixResult result);
} OptixFunctionTable;

typedef struct Optix_36
{
    OptixResult (*optixDenoiserCreate)(OptixDeviceContext context, const void *options, OptixDenoiser *returnHandle);
} Optix_36;

typedef struct OptixFunctionTable_36
{
    struct OptixFunctionTable;
    struct Optix_36;
} OptixFunctionTable_36;

then the OptixFunctionTable_36 would contain both the OptixFunctionTable AND the Optix_36 table (atleast it seems to work like that from a small helloworld sample i played around with).

Dunno if it is worth doing that? Would maybe trunkate it a bit instead of spreading it in 5 headers... Just not sure how to deduplicate the rest.

Saancreed commented 2 years ago

In case of OptixDeviceContextOptions: I tried to make this work but apparently ABI 22 has context options struct without validation level member so I kind of gave up. Not that we currently support that one but it does prove we need different structs for different versions.

I see it would have been fairly easy if it had only been adding new functions to the tables like from eg. ABI_36 -> ABI_41, but the case is that in ABI_47 there are 2 existing functions in 36 and 41 that got updated. This kinda makes it hard (for me) to figure out how i can build the tables...

It also doesn't help that sometimes functions are removed (see optixDenoiserSetModel which was in ABI 41 but seems to be deprecated and replaced with optixDenoiserCreateWithUserModel in ABI 47) and we have no guarantees that behavior of function A with ABI version X matches the behavior of the same function A with ABI version Y even if their signatures/prototypes match.

It is probably not that hard, but am i right in that i cannot really RENAME the const struct functioname to something it should not be, cos when whatever app calls this, it would fail?

We can rename them in our code (and by that I mean inside OptixFunctionTable_36 for example) as long as the offsets dictated by their order remain the same. So, it doesn't matter that we named the first function in that struct optixGetErrorName, what matters is that it's the first function in that struct. We could have renamed it to GetErrorName_36 if it pleased us and it shouldn't matter to anyone.

As i understand it now, the table is being built with the correct function names, and #define ASSIGN_FUNCPTR(f) *(void**)(&table->f) = (void*)&f ## _47 is just some sort of macro to link this to the different ABI functions... Like ASSIGN_FUNCPTR(optixGetErrorName); will point to optixGetErrorName_47 (for the ABI_47 table) internally only, but if i changed it to ASSIGN_FUNCPTR(optixGetErrorName_47); the app would fail the call because optixGetErrorName would not exist if i did that? Right?

It would if you either renamed our own functions or changed the macro a bit so it finds the correct function in our own library. But without any other changes, yeah, it won't. But it's just a macro I wrote for my own convenience, there is nothing magical about it, it just saved me from some typing.

For each ABI version we have two function tables:

What matters is that:

  1. To then n-th struct member in Windows table we assign pointer to our own function of any internal name we choose that has the ms_abi calling convention and has prototype that matches the function at that offset.
  2. That internal function calls the n-th function of our Linux table and returns its result (modulo path conversions, removal of callbacks, etc).

Notice that the names don't matter; they are what they are because it's easier to preprocess NV header a bit and borrow their own names for them than invent our own.

Your example is fine, although you probably wouldn't need the -fms-extensions if you just gave names to your struct members:

typedef struct OptixFunctionTable_36
{
    struct OptixFunctionTable common;
    struct Optix_36 abi;
} OptixFunctionTable_36;

Then, macros assigning them that currently expand to something like

*(void**)(&table->optixGetErrorName) = (void*)&optixGetErrorName_36;
*(void**)(&table->optixGetErrorString) = (void*)&optixGetErrorString_36;
*(void**)(&table->optixDeviceContextCreate) = (void*)&optixDeviceContextCreate_36;
*(void**)(&table->optixDeviceContextDestroy) = (void*)&optixDeviceContextDestroy_36;
…

would instead expand into something like

*(void**)(&table->common.optixGetErrorName) = (void*)&optixGetErrorName_36;
*(void**)(&table->common.optixGetErrorString) = (void*)&optixGetErrorString_36;
*(void**)(&table->abi.optixDeviceContextCreate) = (void*)&optixDeviceContextCreate_36;
*(void**)(&table->abi.optixDeviceContextDestroy) = (void*)&optixDeviceContextDestroy_36;
…

And our own functions could do something like

static const char *__cdecl optixGetErrorName_36(OptixResult result)
{
    TRACE("(%d)\n", result);
    return optixFunctionTable_36.common.optixGetErrorName(result);
}

static OptixResult __cdecl optixDeviceContextDestroy_36(OptixDeviceContext context)
{
    TRACE("(%p)\n", context);
    return optixFunctionTable_36.abi.optixDeviceContextDestroy(context);
}

and it should still work, I think.

But to be honest…

Dunno if it is worth doing that? Would maybe trunkate it a bit instead of spreading it in 5 headers... Just not sure how to deduplicate the rest.

I don't think so :disappointed: I'm starting to believe it would be more of a headache because we can't create any "generic" static const char *__cdecl optixGetErrorName(OptixResult result) that would be shared by all ABIs (because it needs to call the first function of specific ABI's function table, because optixGetErrorName_55 can behave differently from optixGetErrorName_36). What we currently have is a lot of copypasted boilerplate, yes, but it also makes adding new ABIs somewhat straightforward, it's just that OptixDeviceContextOptions that irked me.

But maybe we should instead split nvoptix.c into per ABI version implementations, each in its own file, so that _VERSION suffixes aren't necessary because they exist in separate translation units.

SveSop commented 2 years ago

But maybe we should instead split nvoptix.c into per ABI version implementations, each in its own file, so that _VERSION suffixes aren't necessary because they exist in separate translation units.

I think perhaps that would be nice, given the way the ABI's are built, and after fiddling with this a bit (as a non-programmer) the way they (NVIDIA) adds new functions, and also changes some or removes some (as we see from 41 -> 47) makes this more troublesome to trunkate than i initially thought.

Ill see if i can figure out how that would work... but i do like that idea > trunkating 😄

SveSop commented 2 years ago

Started working on this: https://github.com/SveSop/wine-nvoptix/commit/3f6ab4acde566c38d1b89f11e915d58552871c0e

Compiles, but ofc does not work. But its a start.. When calling optixQueryFunctionTable_36 from main, the values are all garbled.. I guess i have messed up someplace with the "static" stuff and whatnot.

Not sure if i am at the right track with this @Saancreed ? :smile:

Saancreed commented 2 years ago

My guess: does the nvoptix.c compilation unit know about the exact signature of optixQueryFunctionTable_X? We should probably forward–declare them either at the top of nvoptix.c with the exact same parameters or (better) inside each of nvoptix_X.h. Lack of this should generate compiler warnings though, not sure why it doesn't unless winegcc / ninja silences them by default.

SveSop commented 2 years ago

I have a couple of simple optix tests i compiled, and if you want to you can check it out perhaps.. If you do this patch

diff --git a/src/nvoptix_36.c b/src/nvoptix_36.c
index af10af9..7304bff 100644
--- a/src/nvoptix_36.c
+++ b/src/nvoptix_36.c
@@ -304,6 +304,8 @@ OptixResult __cdecl optixQueryFunctionTable_36(
         void* functionTable,
         size_t sizeOfTable)
 {
+    TRACE("(%u, %p, %p, %p, %zu)\n", numOptions, optionKeys, optionValues, functionTable, sizeOfTable);
+
     if (sizeOfTable != sizeof(OptixFunctionTable_36)) return OPTIX_ERROR_FUNCTION_TABLE_SIZE_MISMATCH;

     OptixResult result = poptixQueryFunctionTable(36, numOptions, optionKeys, optionValues, &optixFunctionTable_36, sizeOfTable);

And then run the intro_driver_36.exe from the tests, it seem a bit weird...

011c:trace:nvoptix:optixQueryFunctionTable (36, 0, (nil), (nil), 0x470218, 296)
011c:trace:nvoptix:optixQueryFunctionTable_36 (4653592, (nil), 0x128, 0xfffffffffffffff5, 1065353216)
ERROR: initOptiX() initOptiXFunctionTable() failed: 7802
ERROR: Application failed to initialize successfully.

The values supposed to be passed from main -> nvoptix_36.c source is bonkers as far as i can understand? Test.zip

PS. If you get it working, there is something strange with the 510.54 driver and OptiX ABI 36 and 41 it seems.. i am pretty sure this used to work with an older driver, but it gets lots of errors now. 47/55 works without errors tho. (The demo files i mean).

Saancreed commented 2 years ago

I think I fixed it, please take a look at linked PR.

SveSop commented 2 years ago

Nice! Ill check it out when i get home from work.

Not sure what would be the reason ABI 36/41 shows this weird errors. I am fairly sure it worked fine with the 495 driver series which was what was used when we started. DAZ 4.16 used ABI 36 too, and it was not a spew of errors when we started dabbling with this. DAZ 4.20 now uses ABI 47, so that works, but it could explain why some have experienced issues with nvoptix and DAZ if they still use 4.16 and have upgraded to the 510 driver series.

Ill test the fixes from your PR, and once the "split" branch is merged i might downgrade my driver to 495 just for testing.. no point in debugging wine-nvoptix if it is a driver bug with the 510 branch.

SveSop commented 2 years ago

That seemed to work fine @Saancreed Thanks :+1: Added another minor commit with some wine header updates and some nit's and stuff. Gonna PR this for review, and see if i can figure out that 36/41 ABI business. Just tested the 510.60.02 driver, and it was the same, so if it is a bug it is not fixed. It could be some crud in my demo sources too, so ill look into that too.

Saancreed commented 2 years ago

Might be worth sending an email to NV at their linux-bugs address mentioning issues with older ABI versions and current drivers if the same samples work with older ones.

SveSop commented 2 years ago

It seems to happen for my windows box too, so ill look into the sample compiles. It is some CMake stuff that generate a Visual Studio project.. and some tweaks to use different ABI's. It could very very well be something there aswell, cos i am starting to doubt myself now :fearful:

SveSop commented 2 years ago

Rightfully so.. It seems some of the generated files (ptx) was not as common as i thought. When i initially did that build i guess i just built 1 by 1 SDK, and tested, but i rebuilt now keeping the subfolders inside their own subfolders, and the demo's are working... so no driver bug only human bug :stuck_out_tongue_closed_eyes: