doitsujin / dxvk

Vulkan-based implementation of D3D8, 9, 10 and 11 for Linux / Wine
zlib License
13.18k stars 851 forks source link

Explore advanced toolchain optimizations (e.g. PGO, LTO, whole program optimization, etcetera) #646

Closed ryao closed 6 years ago

ryao commented 6 years ago

Someone with time to explore tweaks to the build system should look into doing PGO builds. There are descriptions of how this works here:

https://dom.as/2009/07/27/profile-guided-optimization-with-gcc/ https://gcc.gnu.org/onlinedocs/gcc-8.2.0/gcc/Instrumentation-Options.html

There are glowing reviews of PGO here:

https://cboard.cprogramming.com/tech-board/111902-pgo-amazing.html https://www.activestate.com/blog/2014/06/python-performance-boost-using-profile-guided-optimization https://clearlinux.org/blogs/profile-guided-optimization-mariadb-benchmarks

I suspect that PGO might help to reduce "stutter".

There are a couple of questions that need to be answered before PGO builds can be done:

  1. How will the filesystem path for profile data work when using a wine prefix?
  2. What benchmark can be run to generate profile data? Presumably, the benchmark should be something that does not require human interaction.
lieff commented 6 years ago

May be start from just -flto ?

ryao commented 6 years ago

As an additional note, it might be a good idea to explore including Link Time Optimization (LTO) alongside PGO. There will be a need to tell the compiler what is externally visible. Supposedly, the gold linker can be used to help with this, but that would need investigation.

Another idea is to to try concatenating all of the .cpp files and building them with -fwhole-program. This will require marking public functions with externally_visible, although it should generate a very well optimized binary.

ryao commented 6 years ago

@lieff You beat me to posting it. I edited the title to reflect the nature of this issue as encompassing more than just PGO.

Quite frankly, I suspect that concatenating all of the files into a single compilation unit and then using -fwhole-program would be better than LTO, but it is up to the person who volunteers to explore this to decide what to try.

Edit: Concatenating all of the files together and building them together is similar to Chromium's jumbo builds, although doing it to enable -fwhole-program would mean that it is for inter-procedural optimizations rather than reducing compile time:

https://chromium.googlesource.com/chromium/src/+/lkcr/docs/jumbo.md

Also, here is another thought. It would be interesting to try using LLVM/Clang with Google's Souper optimizer, especially with the other optimizations mentioned in place (i.e. PGO and WPO/LTO):

https://github.com/google/souper

There are other "superoptimizers" available that probably could be evaluated. They would make compile times skyrocket (taking days to months depending on how they are configured), but I hear that they can provide additional performance. I'd stick to the relatively low dangling fruit of PGO and LTO or a jumbo build with whole program optimization first though.

pchome commented 6 years ago

@ryao

it is up to the person who volunteers to explore this

why not you?

http://mesonbuild.com/Builtin-options.html#base-options See b_lto, b_pgo and --unity

  1. What benchmark can be run to generate profile data?

A bunch of unit tests covering all aspects for general optimization, or an concrete game you want optimize DXVK for.

lieff commented 6 years ago

Meson already have b_lto and b_pgo parameters, so it's build/packaging question, not really project related.

ryao commented 6 years ago

@pchome I am not sure if I have time. If I thought I had time to do it, I would have done it rather than posting about it. We'll see if I do, but I find it doubtful.

A bunch of unit tests covering all aspects for general optimization, or an concrete game you want optimize DXVK for.

The problem with games is that they rely on user input. I suspect that a game would be better than unit tests (although both could be run). We would need some way to start one from the commandline, have it run through a benchmark and then quit.

@lieff How builds work is project related for any project.

doitsujin commented 6 years ago

I suspect that PGO might help to reduce "stutter".

I can already tell you it won't. The shader compiler-related stutter happens inside the driver, and it is inherently slow due to differences in the D3D11 and Vulkan designs.

Might still be worth looking at, but PGO only really helps optimize for one specific workload, LTO is notoriously broken, and any performance gain would be in the single-digit percentages.

We already had Unity builds at some point, but for some strange reason they ended up being significantly slower than regular builds.

ryao commented 6 years ago

@doitsujin My replies are inline:

I can already tell you it won't.

That is unfortunate. Would you mind sharing how you profile? If I recall correctly, my usual profiling tricks don't give me much visibility into binaries running in Wine.

The shader compiler-related stutter happens inside the driver, and it is inherently slow due to differences in the D3D11 and Vulkan designs.

Would you name a few of the differences? I would like to know more. Are you referring to things like D3D binding slots vs vulkan descriptor sets?

Might still be worth looking at

I suggest leaving it to a volunteer and putting the help-wanted label on this. This sort of experiment is something a volunteer could do.

PGO only really helps optimize for one specific workload

That is what I thought until I saw that Firefox improved its Javascript performance in general with PGO.

LTO is notoriously broken

If it were up to me, I'd probably just dump all *.cpp files into a single file and then build with -fwhole-program. It is less fragile than LTO and should work just as well. The caveat about needing to mark public functions/variables with externally_visible does apply. Otherwise, breakage will occur when the symbols are optimized away. Another issue would be that it would reduce the information available in backtraces.

doitsujin commented 6 years ago

That is unfortunate. Would you mind sharing how you profile?

winelib builds of DXVK work with the usual Linux profiling tools and debuggers.

Are you referring to things like D3D binding slots vs vulkan descriptor sets?

That's causing some pain elsewhere, but the main issue with shader compilation is that you can compile shaders individually in D3D, and the D3D11 driver will do a lot of magic during the respective Create*Shader call, whereas Vulkan pipelines expect all shaders to be present (in SPIR-V, which then has to be optimized and translated to hardware instructions by the driver), as well as the full state vector, so we have to do all the work on the first draw that a specific shader is used with.

pchome commented 6 years ago

I'd probably just dump all *.cpp files into a single file

that's how meson unity builds (--unity on) work, but per module http://mesonbuild.com/Unity-builds.html#unity-builds

ryao commented 6 years ago

@doitsujin I take it that your profiling shows that most of the time there is spent in the graphics driver. This might be asking the obvious question, but is there no way to parallelize that process?

For example, n shaders A[i] for i from 0 to n must be built, so m worker threads from j = 0 to m - 1 are created and they each do every A[i] where i % m == j. After they are all finished, the main thread just gathers all of the work from the worker threads. My feeling is that it is not that straightforward, but you piqued my curiosity.

pchome commented 6 years ago

Also, LTO won't work for winelib builds, because you need LTOed WINE, or particularly libwinecrt0.a.

I'm using LTO and PGO for my whole system wherever possible, and WINE is one of the unreached goals.

ryao commented 6 years ago

@pchome I was leaning toward thinking that a so called unity build with -fwhole-program would be better than LTO. As I said above, LTO is fragile. If you build everything as one compilation unit with -fwhole-program, you don't need LTO.

ryao commented 6 years ago

@doitsujin Nevermind about the parallelism. I need to do my own profiling. I have spent more time looking at this code than I really have at the moment, but I think I understand a few bits of it. In particular, the draw calls that you mentioned are likely in DxvkContext::commitGraphicsState. Given how much this piqued my interest, I'll probably profile the code at some point and learn where the time is being spent. Concurrent programming is always fun. ;)

pchome commented 6 years ago

Then note https://github.com/doitsujin/dxvk/commit/357277563563ca356323105c1121f0cf3a90041f#diff-b396766df186599eed8a7aff228ec9f7

ryao commented 6 years ago

@pchome That is good to know. I still think opportunities for interprocedural optimization from those (with -fwhole-program) could be a low dangling fruit for someone who has only minor programming knowledge to explore.

After reading what @doitsujin said and looking through the code, I found some more interesting avenues to explore. In particular, I am not seeing much threading and I see no use of machine prefetch hints in the code. I need to make time to profile to see where the bottlenecks are more clearly.

ryao commented 6 years ago

After some thought, I think I should close this. It is probably not a great use of people's time, although I did learn some interesting things from the discussion.

pchome commented 6 years ago

Why so? At least PGO is real, and quite easy to test.

The only thing we should do -- create a list of small tests (maybe wine's d3d11 tests, or some other d3d11 demos), and define a final benchmark to test results.

e.g.

#!/bin/sh
run_benchmark
buld_pofile
run_tests
use_profile
run_benchmark
ryao commented 6 years ago

Alright. I am reopening this.

ryao commented 6 years ago

@doitsujin One last thing as I could not help myself from eyeballing the code a bit more. Does your profiling indicate that the shuttering is from dxvkgraphicspipeline::DxvkGraphicsPipeline()? I see 5 ->createShaderModule() calls there that probably could run in parallel.

doitsujin commented 6 years ago

vkCreateShaderModule is literally a memcpy in actual Vulkan drivers. The expensive part is creating the Vulkan pipeline (vkCreateGraphicsPipelines).

ryao commented 6 years ago

@doitsujin That is tricky. Couldn't you just cache the DXBC shaders and other things that DXVK receives from the game and turns into a pipeline? Then on subsequent runs, if one of the shaders from a previous session are loaded by a game (identified by a matching checksum), DXVK could load the rest from cache and pre-create the pipeline? That is just a rough idea, but some kind of driver independent cache seems like the only way around it.

pchome commented 6 years ago

@pchome

create a list of small tests maybe wine's d3d11 tests

Ok, I able to build standalone dxgi test from wine sources, it's executing quickly and looks like it can be used for PGO needs. 0026:dxgi: 6386 tests executed (0 marked as todo, 270 failures), 5 skipped.

I going to do the same for other wine's dx10/dx11 related tests, and combine all together before sharing.

EDIT: dxgi.test.txt dxgi_dxgi.log.txt

pchome commented 6 years ago

https://github.com/pchome/wine-playground/tree/master/dx1x-tests

Note:

d3d11 test out: 0025:d3d11: 1154 tests executed (0 marked as todo, 201 failures), 1 skipped.

So dxgi and d3d11 tests could be used as is (for now), despite failures. Others are requires a patching work.

ryao commented 6 years ago

The tests that failed probably merit their own issues.

pchome commented 6 years ago

Mostly no, missing interfaces, specific formats and probably WINE's internal stuff.

err:   D3D11: Cannot create texture:
  Format:  VK_FORMAT_E5B9G9R9_UFLOAT_PACK32
  Extent:  512x512x1
  Samples: 1
  Layers:  1
  Levels:  1
  Usage:   13
err:   DXGI: CheckInterfaceSupport: Unsupported interface
err:   db6f6ddb-ac77-4e88-8253-819df9bbf140

@doitsujin can check it by himself , if he'll want to.

EDIT: A lot of them (tests) are failing even for wine. http://test.winehq.org/data/ http://test.winehq.org/data/64d9f309b7f74d4154e685c5d1d78c1b8335c0bc/index_Linux.html

ryao commented 6 years ago

I have a theory on why unity builds took longer. For large projects, the headers can be substantially more complex than the files themselves. Furthermore, you can have many files to compile such that even with -j$(nproc), each core must process a large number of them. The time savings from unity builds comes from parsing the headers only once for all of those files. If all of the additional time spent parsing all of the files that would be handled on other cores is less than the savings from not parsing the headers once for each file on a single core, you save time. If not, you do not save time.

I believe that DXVK’s headers are not complex enough to save time with unity builds. However, there should be opportunities for strong interprocedural optimizations from unity builds if DXVK is adapted to support -fwhole-program as part of them. This means marking functions externally accessible with the always_visible attribute according to the compiler documentation. This idea needs testing to see if it makes a difference.

doitsujin commented 6 years ago
err:   D3D11: Cannot create texture:
  Format:  VK_FORMAT_E5B9G9R9_UFLOAT_PACK32
  Extent:  512x512x1
  Samples: 1
  Layers:  1
  Levels:  1
  Usage:   13

That's not a bug, just means that you cannot render to that format (Usage 0x13 is color attachment + transfer).

SveSop commented 6 years ago

@ryao I might be misunderstanding you, but if not: "unity builds" was cancelled due to performance degradation, and not "build time".

Not tested it since it was dropped, so for all i know it might not be an issue anymore?

ryao commented 6 years ago

@SveSop I misremembered what I read when I was thinking about it then. Anyway, I would not expect unity builds to be a performance win unless -fwhole-program Is used. That needs function annotations with the always_visible attribute. If performance still degrades performance with -fwhole-program, then some attention probably needs to be given to the compiler’s optimization stages to see what it is doing wrong and how we can toggle switches to get it to do things correctly.

ryao commented 6 years ago

This was a quick stab at producing d3d11.dll and dxgi.dll files built with -fwhole-program for evaluation purposes:

cd /path/to/dxvk
meson --unity on --cross-file build-win32.txt --prefix /tmp/dxvk-win32-whole-program build.w32
cd build.w32
meson configure -Dbuildtype=release
ninja

cat << END > dxgi.dll.c
#include "src/dxvk/src@dxvk@@dxvk@sta/dxvk-unity.cpp"
#include "src/dxgi/src@dxgi@@dxgi@sha/dxgi-unity.cpp"
#include "src/util/src@util@@util@sta/util-unity.cpp"
#include "src/spirv/src@spirv@@spirv@sta/spirv-unity.cpp"
#include "../src/util/sha1/sha1.c"
END

i686-w64-mingw32-g++ -fwhole-program -std=c++1z -O2 -g -o src/dxgi/dxgi.dll ../src/dxgi/dxgi.def -Wl,--no-undefined -Wl,--as-needed -shared ../src/dxgi/dxgi.def -Wl,--start-group -Wl,--out-implib=src/dxgi/libdxgi.dll.a dxgi.dll.c -I ../include -I ../src/dxvk -I ../src/dxgi -I . -I ../build.w32/src/dxvk/src@dxvk@@dxvk@sta/ -I ../build.w32/src/dxgi/src@dxgi@@dxgi@sha/ ../lib32/vulkan-1.lib  -lkernel32 -luser32 -lgdi32 -lwinspool -lshell32 -lole32 -loleaut32 -luuid -lcomdlg32 -ladvapi32 -Wl,--end-group -static -static-libgcc -static-libstdc++ -Wl,--add-stdcall-alias,--enable-stdcall-fixup

cat << END > d3d11.dll.c
#include "src/d3d11/src@d3d11@@d3d11@sha/d3d11-unity.cpp"
#include "src/dxbc/src@dxbc@@dxbc@sta/dxbc-unity.cpp"
#include "src/dxvk/src@dxvk@@dxvk@sta/dxvk-unity.cpp"
#include "src/util/src@util@@util@sta/util-unity.cpp"
#include "src/spirv/src@spirv@@spirv@sta/spirv-unity.cpp"
#include "../src/util/sha1/sha1.c"
END

i686-w64-mingw32-g++ -std=c++1z -O2 -g  -o src/d3d11/d3d11.dll ../src/d3d11/d3d11.def -Wl,--no-undefined -Wl,--as-needed -shared ../src/d3d11/d3d11.def -Wl,--start-group -Wl,--out-implib=src/d3d11/libd3d11.dll.a d3d11.dll.c -I ../include -I ../src/dxvk -I ../src/dxgi -I . -I ../build.w32/src/dxvk/src@dxvk@@dxvk@sta/ -I ../build.w32/src/dxgi/src@dxgi@@dxgi@sha/ ../lib32/vulkan-1.lib -ldxgi /home/richard/devel/dxvk/lib32/vulkan-1.lib -lkernel32 -luser32 -lgdi32 -lwinspool -lshell32 -lole32 -loleaut32 -luuid -lcomdlg32 -ladvapi32 -Wl,--end-group -static -static-libgcc -static-libstdc++ -Wl,--add-stdcall-alias,--enable-stdcall-fixup

ninja install

Contrary to my belief, setting always_visible was unnecessary. This was confirmed by quick examination of The Export Tables (interpreted .edata section contents) via i686-w64-mingw32-objdump, which showed that the same symbols were being exported, plus a quick runtime test.

I was able to replace the d3d11.dll and dxgi.dll files provided with proton and it ran without an problem. I am certain that I did replace the correct binary because DXVK_HUD=version is showing a changed version number.

It also might be of interest that the binaries built this way are smaller after stripping.

Before:

richard@desktop ~/devel/dxvk $ ls -l /tmp/dxvk-win32-v0.72-39-g20c89c3/{d3d11.dll,dxgi.dll}
-rwxr-xr-x 1 richard richard 2353664 Sep 23 00:26 /tmp/dxvk-win32-v0.72-39-g20c89c3/d3d11.dll
-rwxr-xr-x 1 richard richard 1845248 Sep 23 00:26 /tmp/dxvk-win32-v0.72-39-g20c89c3/dxgi.dll

After:

richard@desktop ~/devel/dxvk $ cp ./build.w32/src/d3d11/d3d11.dll ./build.w32/src/dxgi/dxgi.dll /tmp/
richard@desktop ~/devel/dxvk $ i686-w64-mingw32-strip /tmp/dxgi.dll /tmp/d3d11.dll 
richard@desktop ~/devel/dxvk $ ls -l  /tmp/dxgi.dll /tmp/d3d11.dll 
-rwxr-xr-x 1 richard richard 1959936 Sep 24 12:50 /tmp/d3d11.dll
-rwxr-xr-x 1 richard richard 1414144 Sep 24 12:50 /tmp/dxgi.dll

I am sharing this in case someone else who has more time wants to test this to see if it helps. However, I suspect that doing this for release builds might be worthwhile for the smaller binary sizes, even if performance does not improve, as long as performance does not become worse. The build system would need to be fixed to avoid the horrible hack that I did to make the PoC though.

pchome commented 6 years ago

Do "Before" version was built using same flags, except -fwhole-program ?

-Dbuildtype=release will add -O3 flag, so binaries expected to be bigger.

ryao commented 6 years ago

@pchome The before build was built like this:

meson --cross-file build-win32.txt --prefix /tmp/dxvk-win32
cd build.w32
meson configure -Dbuildtype=release
ninja
ninja install

I manually moved the files and stripped them afterward.

I didn't capture the CFLAGS being used for the build, so I didn't check. The size difference had been unexpected and was included in my comment at the last moment. A quick grep of the sources didn't show me any CFLAGS and I am not familiar with meson. However, I just tested -O3 builds out of curiosity:

richard@desktop ~/devel/dxvk $ ls -l  /tmp/dxgi.dll /tmp/d3d11.dll
-rwxr-xr-x 1 richard richard 2098176 Sep 24 13:13 /tmp/d3d11.dll
-rwxr-xr-x 1 richard richard 1437184 Sep 24 13:13 /tmp/dxgi.dll

They are still smaller.

ryao commented 6 years ago

@doitsujin Why is your official release built with 2 different compilers?

richard@desktop /tmp $ strings dxvk-0.80/x32/d3d11.dll | grep GCC: | sort -u
GCC: (GNU) 4.9.2
GCC: (GNU) 8.1.0
pchome commented 6 years ago

I just checked configure phase, and unity files available on this stage : build.64/src/d3d11/src@d3d11@@d3d11.dll@sha/.

So it's possible to integrate -fwhole-program into build process, by skipping some modules compilation. I'll check this later.

ryao commented 6 years ago

@pchome It is not quite that simple because there are internal libraries being built. I had to work around that by making a manual unity file combining all of the unity files for those libraries to make it work.

pchome commented 6 years ago

I had to work around that by making a manual unity file combining all of the unity files for those libraries to make it work.

You can use generator https://github.com/doitsujin/dxvk/blob/master/meson.build#L54 https://github.com/doitsujin/dxvk/blob/master/src/dxgi/meson.build#L19

or (maybe) pass all *_src variables to shared_library().

Also, it may be worth to ask https://github.com/mesonbuild/meson for such (-fwhole-program+unity) feature.

pchome commented 6 years ago

whole-program.patch.txt

A hack for -fwhole-program (for testing purpose)

ryao commented 6 years ago

@pchome Were you able to reproduce the smaller binaries?

pchome commented 6 years ago

I have no MinGW installed, and as I said I can't use -fwhole-program with this patch for winelib build. W/o -fwhole-program almost (different version) equal sized files was generated, compared to those currently installed in system.

This produces two huge d3d11.dll-unity.cpp and dxgi.dll-unity.cpp, but automatically.

lieff commented 6 years ago

In my observations -flto have advantage over unity builds (not much, but still). This because of exported (non-static) symbols must have an ABI in unity build. Compiler can't change it because he do not know if someone wants to call this symbol from resulting object. With -flto compiler makes decision when actually links application, so he can change ABI if he wants. Not sure if we can bypass it with -fvisibility=hidden.

ryao commented 6 years ago

@lieff -fwhole-program tells the toolchain to assume that no one will ever want to call those externally.

Here are instructions for a 64-bit build of the proof of concept that I posted earlier:

cd /path/to/dxvk
meson --unity on --cross-file build-win64.txt --prefix /tmp/dxvk-win64-whole-program build.w64
cd build.w64
meson configure -Dbuildtype=release
ninja

cat << END > dxgi.dll.c
#include "src/dxvk/src@dxvk@@dxvk@sta/dxvk-unity.cpp"
#include "src/dxgi/src@dxgi@@dxgi@sha/dxgi-unity.cpp"
#include "src/util/src@util@@util@sta/util-unity.cpp"
#include "src/spirv/src@spirv@@spirv@sta/spirv-unity.cpp"
#include "../src/util/sha1/sha1.c"
END

x86_64-w64-mingw32-g++ -fwhole-program -std=c++1z -O2 -g -o src/dxgi/dxgi.dll ../src/dxgi/dxgi.def -Wl,--no-undefined -Wl,--as-needed -shared ../src/dxgi/dxgi.def -Wl,--start-group -Wl,--out-implib=src/dxgi/libdxgi.dll.a dxgi.dll.c -I ../include -I ../src/dxvk -I ../src/dxgi -I . -I ../build.w64/src/dxvk/src@dxvk@@dxvk@sta/ -I ../build.w64/src/dxgi/src@dxgi@@dxgi@sha/ ../lib/vulkan-1.lib  -lkernel32 -luser32 -lgdi32 -lwinspool -lshell32 -lole32 -loleaut32 -luuid -lcomdlg32 -ladvapi32 -Wl,--end-group -static -static-libgcc -static-libstdc++ -Wl,--add-stdcall-alias,--enable-stdcall-fixup

cat << END > d3d11.dll.c
#include "src/d3d11/src@d3d11@@d3d11@sha/d3d11-unity.cpp"
#include "src/dxbc/src@dxbc@@dxbc@sta/dxbc-unity.cpp"
#include "src/dxvk/src@dxvk@@dxvk@sta/dxvk-unity.cpp"
#include "src/util/src@util@@util@sta/util-unity.cpp"
#include "src/spirv/src@spirv@@spirv@sta/spirv-unity.cpp"
#include "../src/util/sha1/sha1.c"
END

x86_64-w64-mingw32-g++ -std=c++1z -O2 -g  -o src/d3d11/d3d11.dll ../src/d3d11/d3d11.def -Wl,--no-undefined -Wl,--as-needed -shared ../src/d3d11/d3d11.def -Wl,--start-group -Wl,--out-implib=src/d3d11/libd3d11.dll.a d3d11.dll.c -I ../include -I ../src/dxvk -I ../src/dxgi -I . -I ../build.w64/src/dxvk/src@dxvk@@dxvk@sta/ -I ../build.w64/src/dxgi/src@dxgi@@dxgi@sha/ ../lib/vulkan-1.lib -ldxgi /home/richard/devel/dxvk/lib32/vulkan-1.lib -lkernel32 -luser32 -lgdi32 -lwinspool -lshell32 -lole32 -loleaut32 -luuid -lcomdlg32 -ladvapi32 -Wl,--end-group -static -static-libgcc -static-libstdc++ -Wl,--add-stdcall-alias,--enable-stdcall-fixup

ninja install
ryao commented 6 years ago

I noticed a small mistake in how I was doing the -fwhole-program build of d3d11.dll where src/dxvk/src@dxvk@@dxvk@sta/dxvk-unity.cpp was built separately. That kept the binaries from being as well optimized / small as they could have been. I have updated the build instructions and size data with the corrections.

lieff commented 6 years ago

@ryao Yes, it should turn all to static. But it's my old observation that it does not work for C++ member functions (may be because C++ methods can't be static in terms of С++). May be now newer gcc behaves differently, I've look at it long time ago and I've not checked clang.

pchome commented 6 years ago

@ryao

I noticed a small mistake ...

You also should use --unity on for both, so we could see the real -fwhole-program difference/advantage.

Edit: And -O3

ryao commented 6 years ago

@lieff I am building with GCC 8.2.0. I have yet to try Clang

@pchome I am using --unity on for both.

Also, I have proof of concept builds available for those interested in testing them:

wget http://dev.gentoo.org/~ryao/dist/dxvk-win64-v0.80-whole-program.txz{,.sig}
gpg --verify dxvk-win64-v0.80-whole-program.txz{.sig,}

wget http://dev.gentoo.org/~ryao/dist/dxvk-win32-v0.80-whole-program.txz{,.sig}
gpg --verify dxvk-win32-v0.80-whole-program.txz{.sig,}

I already found a volunteer to help test, so there is no need for more, but I'm posting them for the various contributors here to be able to evaluate. You can get my PGP key from github:

https://github.com/ryao.gpg

I included the d3d10 dlls, but they aren't built with -fwhole-program, so there is really nothing special about them beyond being unity builds built with GCC 8.2.0. These binaries have been stripped to save space on the webserver. Also, despite my instructions showing -O2, all of the binaries were built with -O3 to match the release builds. I left the instructions with -O2 to avoid making the history confusing.

ryao commented 6 years ago

@pchome I just realized that you meant before/after. That will need to wait a few days because I have spent all of my spare time on this and then some, but I will be happy to provide numbers for them when I have some more time.

pchome commented 6 years ago

I'm still want to do my own build/test, no luck. Here is final options passed to compiler by winegcc:

COLLECT_GCC_OPTIONS='-fdiagnostics-color=always' '-fshort-wchar' '-D' 'WINE_UNICODE_NATIVE' \
'-D' '_REENTRANT' '-D' 'WIN64' '-D' '_WIN64' '-D' '__WIN64' '-D' '__WIN64__' '-D' 'WIN32' \
'-D' '_WIN32' '-D' '__WIN32' '-D' '__WIN32__' '-D' '__WINNT' '-D' '__WINNT__' \
'-D' '__stdcall=__attribute__((ms_abi))' '-D' '__cdecl=__attribute__((ms_abi))' \
'-D' '_stdcall=__attribute__((ms_abi))' '-D' '_cdecl=__attribute__((ms_abi))' \
'-D' '__fastcall=__attribute__((ms_abi))' '-D' '_fastcall=__attribute__((ms_abi))' \
'-D' '__declspec(x)=__declspec_##x' '-D' '__declspec_align(x)=__attribute__((aligned(x)))' \
'-D' '__declspec_allocate(x)=__attribute__((section(x)))' \
'-D' '__declspec_deprecated=__attribute__((deprecated))' \
'-D' '__declspec_dllimport=__attribute__((dllimport))' \
'-D' '__declspec_dllexport=__attribute__((dllexport))' \
'-D' '__declspec_naked=__attribute__((naked))' \
'-D' '__declspec_noinline=__attribute__((noinline))' \
'-D' '__declspec_noreturn=__attribute__((noreturn))' \
'-D' '__declspec_nothrow=__attribute__((nothrow))' \
'-D' '__declspec_novtable=__attribute__(())' \
'-D' '__declspec_selectany=__attribute__((weak))' \
'-D' '__declspec_thread=__thread' \
'-D' '__int8=char' '-D' '__int16=short' '-D' '__int32=int' '-D' '__int64=long' '-D' '__WINE__' \
'-c' '-o' 'src/dxgi/src@dxgi@@dxgi.dll@sha/meson-generated_dxgi.dll-unity.cpp.o' \
'-I' 'src/dxgi/src@dxgi@@dxgi.dll@sha' '-I' 'src/dxgi' '-I' '../../dxvk/src/dxgi' \
'-I' '../../dxvk/./include' '-I' 'src/dxvk' '-I' '../../dxvk/src/dxvk' '-I' '.' \
'-pipe' '-D' '_FILE_OFFSET_BITS=64' '-Wall' \
'-Winvalid-pch' '-Wnon-virtual-dtor' '-std=c++17' '-O3' '-D' 'NOMINMAX' \
'-fwhole-program' \
'-fPIC' '-pthread' '-m64' '-Wno-attributes' '-march=native' \
'-O3' '-fgraphite-identity' '-floop-nest-optimize' \
'-MD' '-MQ' 'src/dxgi/src@dxgi@@dxgi.dll@sha/meson-generated_dxgi.dll-unity.cpp.o' \
'-MF' 'src/dxgi/src@dxgi@@dxgi.dll@sha/meson-generated_dxgi.dll-unity.cpp.o.d' \
'-v' '-isystem' '/usr/include/wine-vanilla-3.16/wine/windows' '-shared-libgcc'

Maybe there is some attributes I should change, before -fwhole-program use?

My variant uses compile and link steps, not sure if it can be changed for winelib build. Because different jobs should be done by winegcc/winebuild itself, before something actually compiled/linked.

ryao commented 6 years ago

Is it emitting errors? I would need to see at least some of them to be able to guess what is wrong.

I want winelib builds too, although my winelib builds don’t work for me yet, so I need to resolve that before I even try.

pchome commented 6 years ago

using -v to explain all commands:

wineg++ -v  -o src/dxgi/dxgi.dll.so ../../dxvk/src/dxgi/dxgi.spec 'src/dxgi/src@dxgi@@dxgi.dll@sha/meson-generated_dxgi.dll-unity.cpp.o' -Wl,--no-undefined -Wl,--as-needed -Wl,-O1 -shared -fPIC -Wl,--start-group -Wl,-soname,dxgi.dll.so -lwinevulkan -Wl,--end-group -pthread -m64 -mwindows 
winebuild -v -fno-asynchronous-unwind-tables --cc-cmd=x86_64-pc-linux-gnu-gcc -m64 --ld-cmd=x86_64-pc-linux-gnu-ld -m64 -D_REENTRANT -fPIC --dll -o dxgi.dll-4y5sAd.spec.o -E ../../dxvk/src/dxgi/dxgi.spec -L/usr/lib64/wine-vanilla-3.16/wine -L/usr/lib64/wine-vanilla-3.16 -- src/dxgi/src@dxgi@@dxgi.dll@sha/meson-generated_dxgi.dll-unity.cpp.o /usr/lib64/wine-vanilla-3.16/wine/libwinevulkan.def /usr/lib64/wine-vanilla-3.16/wine/libshell32.def /usr/lib64/wine-vanilla-3.16/wine/libcomdlg32.def /usr/lib64/wine-vanilla-3.16/wine/libgdi32.def /usr/lib64/wine-vanilla-3.16/wine/libadvapi32.def /usr/lib64/wine-vanilla-3.16/wine/libuser32.def /usr/lib64/wine-vanilla-3.16/wine/libwinecrt0.a /usr/lib64/wine-vanilla-3.16/wine/libkernel32.def /usr/lib64/wine-vanilla-3.16/wine/libntdll.def 
x86_64-pc-linux-gnu-gcc -m64 -xassembler -c -m64 -o dxgi.dqLQXl.o dxgi.YIr7Yg.s
x86_64-pc-linux-gnu-ld -m elf_x86_64 -r -o dxgi.qyfoFq.o dxgi.dqLQXl.o src/dxgi/src@dxgi@@dxgi.dll@sha/meson-generated_dxgi.dll-unity.cpp.o /usr/lib64/wine-vanilla-3.16/wine/libwinecrt0.a
../../dxvk/src/dxgi/dxgi.spec:1: function 'CreateDXGIFactory' not defined
../../dxvk/src/dxgi/dxgi.spec:2: function 'CreateDXGIFactory1' not defined
../../dxvk/src/dxgi/dxgi.spec:3: function 'CreateDXGIFactory2' not defined
winegcc: winebuild failed