Closed ryao closed 6 years ago
May be start from just -flto ?
As an additional note, it might be a good idea to explore including Link Time Optimization (LTO) alongside PGO. There will be a need to tell the compiler what is externally visible. Supposedly, the gold linker can be used to help with this, but that would need investigation.
Another idea is to to try concatenating all of the .cpp files and building them with -fwhole-program
. This will require marking public functions with externally_visible, although it should generate a very well optimized binary.
@lieff You beat me to posting it. I edited the title to reflect the nature of this issue as encompassing more than just PGO.
Quite frankly, I suspect that concatenating all of the files into a single compilation unit and then using -fwhole-program would be better than LTO, but it is up to the person who volunteers to explore this to decide what to try.
Edit: Concatenating all of the files together and building them together is similar to Chromium's jumbo builds, although doing it to enable -fwhole-program would mean that it is for inter-procedural optimizations rather than reducing compile time:
https://chromium.googlesource.com/chromium/src/+/lkcr/docs/jumbo.md
Also, here is another thought. It would be interesting to try using LLVM/Clang with Google's Souper optimizer, especially with the other optimizations mentioned in place (i.e. PGO and WPO/LTO):
https://github.com/google/souper
There are other "superoptimizers" available that probably could be evaluated. They would make compile times skyrocket (taking days to months depending on how they are configured), but I hear that they can provide additional performance. I'd stick to the relatively low dangling fruit of PGO and LTO or a jumbo build with whole program optimization first though.
@ryao
it is up to the person who volunteers to explore this
why not you?
http://mesonbuild.com/Builtin-options.html#base-options
See b_lto
, b_pgo
and --unity
- What benchmark can be run to generate profile data?
A bunch of unit tests covering all aspects for general optimization, or an concrete game you want optimize DXVK for.
Meson already have b_lto and b_pgo parameters, so it's build/packaging question, not really project related.
@pchome I am not sure if I have time. If I thought I had time to do it, I would have done it rather than posting about it. We'll see if I do, but I find it doubtful.
A bunch of unit tests covering all aspects for general optimization, or an concrete game you want optimize DXVK for.
The problem with games is that they rely on user input. I suspect that a game would be better than unit tests (although both could be run). We would need some way to start one from the commandline, have it run through a benchmark and then quit.
@lieff How builds work is project related for any project.
I suspect that PGO might help to reduce "stutter".
I can already tell you it won't. The shader compiler-related stutter happens inside the driver, and it is inherently slow due to differences in the D3D11 and Vulkan designs.
Might still be worth looking at, but PGO only really helps optimize for one specific workload, LTO is notoriously broken, and any performance gain would be in the single-digit percentages.
We already had Unity builds at some point, but for some strange reason they ended up being significantly slower than regular builds.
@doitsujin My replies are inline:
I can already tell you it won't.
That is unfortunate. Would you mind sharing how you profile? If I recall correctly, my usual profiling tricks don't give me much visibility into binaries running in Wine.
The shader compiler-related stutter happens inside the driver, and it is inherently slow due to differences in the D3D11 and Vulkan designs.
Would you name a few of the differences? I would like to know more. Are you referring to things like D3D binding slots vs vulkan descriptor sets?
Might still be worth looking at
I suggest leaving it to a volunteer and putting the help-wanted
label on this. This sort of experiment is something a volunteer could do.
PGO only really helps optimize for one specific workload
That is what I thought until I saw that Firefox improved its Javascript performance in general with PGO.
LTO is notoriously broken
If it were up to me, I'd probably just dump all *.cpp files into a single file and then build with -fwhole-program
. It is less fragile than LTO and should work just as well. The caveat about needing to mark public functions/variables with externally_visible
does apply. Otherwise, breakage will occur when the symbols are optimized away. Another issue would be that it would reduce the information available in backtraces.
That is unfortunate. Would you mind sharing how you profile?
winelib builds of DXVK work with the usual Linux profiling tools and debuggers.
Are you referring to things like D3D binding slots vs vulkan descriptor sets?
That's causing some pain elsewhere, but the main issue with shader compilation is that you can compile shaders individually in D3D, and the D3D11 driver will do a lot of magic during the respective Create*Shader
call, whereas Vulkan pipelines expect all shaders to be present (in SPIR-V, which then has to be optimized and translated to hardware instructions by the driver), as well as the full state vector, so we have to do all the work on the first draw that a specific shader is used with.
I'd probably just dump all *.cpp files into a single file
that's how meson unity builds (--unity on
) work, but per module
http://mesonbuild.com/Unity-builds.html#unity-builds
@doitsujin I take it that your profiling shows that most of the time there is spent in the graphics driver. This might be asking the obvious question, but is there no way to parallelize that process?
For example, n shaders A[i]
for i from 0 to n must be built, so m worker threads from j = 0 to m - 1 are created and they each do every A[i]
where i % m == j
. After they are all finished, the main thread just gathers all of the work from the worker threads. My feeling is that it is not that straightforward, but you piqued my curiosity.
Also, LTO won't work for winelib builds, because you need LTOed WINE, or particularly libwinecrt0.a
.
I'm using LTO and PGO for my whole system wherever possible, and WINE is one of the unreached goals.
@pchome I was leaning toward thinking that a so called unity build with -fwhole-program would be better than LTO. As I said above, LTO is fragile. If you build everything as one compilation unit with -fwhole-program
, you don't need LTO.
@doitsujin Nevermind about the parallelism. I need to do my own profiling. I have spent more time looking at this code than I really have at the moment, but I think I understand a few bits of it. In particular, the draw calls that you mentioned are likely in DxvkContext::commitGraphicsState. Given how much this piqued my interest, I'll probably profile the code at some point and learn where the time is being spent. Concurrent programming is always fun. ;)
@pchome That is good to know. I still think opportunities for interprocedural optimization from those (with -fwhole-program) could be a low dangling fruit for someone who has only minor programming knowledge to explore.
After reading what @doitsujin said and looking through the code, I found some more interesting avenues to explore. In particular, I am not seeing much threading and I see no use of machine prefetch hints in the code. I need to make time to profile to see where the bottlenecks are more clearly.
After some thought, I think I should close this. It is probably not a great use of people's time, although I did learn some interesting things from the discussion.
Why so? At least PGO is real, and quite easy to test.
The only thing we should do -- create a list of small tests (maybe wine's d3d11 tests, or some other d3d11 demos), and define a final benchmark to test results.
e.g.
#!/bin/sh
run_benchmark
buld_pofile
run_tests
use_profile
run_benchmark
Alright. I am reopening this.
@doitsujin One last thing as I could not help myself from eyeballing the code a bit more. Does your profiling indicate that the shuttering is from dxvkgraphicspipeline::DxvkGraphicsPipeline()
? I see 5 ->createShaderModule()
calls there that probably could run in parallel.
vkCreateShaderModule
is literally a memcpy
in actual Vulkan drivers. The expensive part is creating the Vulkan pipeline (vkCreateGraphicsPipelines
).
@doitsujin That is tricky. Couldn't you just cache the DXBC shaders and other things that DXVK receives from the game and turns into a pipeline? Then on subsequent runs, if one of the shaders from a previous session are loaded by a game (identified by a matching checksum), DXVK could load the rest from cache and pre-create the pipeline? That is just a rough idea, but some kind of driver independent cache seems like the only way around it.
@pchome
create a list of small tests maybe wine's d3d11 tests
Ok, I able to build standalone dxgi
test from wine sources, it's executing quickly and looks like it can be used for PGO needs.
0026:dxgi: 6386 tests executed (0 marked as todo, 270 failures), 5 skipped.
I going to do the same for other wine's dx10/dx11 related tests, and combine all together before sharing.
https://github.com/pchome/wine-playground/tree/master/dx1x-tests
Note:
test-run.sh
contains examples how to rundxgi
and d3d11
tests finished correctlyd3d10.device
is an RAM-consuming evil,
failed with unimplemented function d3d10.dll.D3D10StateBlockMaskDifference
d3d10.effect
- effects not supported in DXVKd3d10_1
failed w/ exception and d3d10core
failed w/ segfaultd3d11 test out: 0025:d3d11: 1154 tests executed (0 marked as todo, 201 failures), 1 skipped.
So dxgi
and d3d11
tests could be used as is (for now), despite failures.
Others are requires a patching work.
The tests that failed probably merit their own issues.
Mostly no, missing interfaces, specific formats and probably WINE's internal stuff.
err: D3D11: Cannot create texture:
Format: VK_FORMAT_E5B9G9R9_UFLOAT_PACK32
Extent: 512x512x1
Samples: 1
Layers: 1
Levels: 1
Usage: 13
err: DXGI: CheckInterfaceSupport: Unsupported interface
err: db6f6ddb-ac77-4e88-8253-819df9bbf140
@doitsujin can check it by himself , if he'll want to.
EDIT: A lot of them (tests) are failing even for wine. http://test.winehq.org/data/ http://test.winehq.org/data/64d9f309b7f74d4154e685c5d1d78c1b8335c0bc/index_Linux.html
I have a theory on why unity builds took longer. For large projects, the headers can be substantially more complex than the files themselves. Furthermore, you can have many files to compile such that even with -j$(nproc)
, each core must process a large number of them. The time savings from unity builds comes from parsing the headers only once for all of those files. If all of the additional time spent parsing all of the files that would be handled on other cores is less than the savings from not parsing the headers once for each file on a single core, you save time. If not, you do not save time.
I believe that DXVK’s headers are not complex enough to save time with unity builds. However, there should be opportunities for strong interprocedural optimizations from unity builds if DXVK is adapted to support -fwhole-program
as part of them. This means marking functions externally accessible with the always_visible
attribute according to the compiler documentation. This idea needs testing to see if it makes a difference.
err: D3D11: Cannot create texture:
Format: VK_FORMAT_E5B9G9R9_UFLOAT_PACK32
Extent: 512x512x1
Samples: 1
Layers: 1
Levels: 1
Usage: 13
That's not a bug, just means that you cannot render to that format (Usage 0x13
is color attachment + transfer).
@ryao I might be misunderstanding you, but if not: "unity builds" was cancelled due to performance degradation, and not "build time".
Not tested it since it was dropped, so for all i know it might not be an issue anymore?
@SveSop I misremembered what I read when I was thinking about it then. Anyway, I would not expect unity builds to be a performance win unless -fwhole-program
Is used. That needs function annotations with the always_visible
attribute. If performance still degrades performance with -fwhole-program
, then some attention probably needs to be given to the compiler’s optimization stages to see what it is doing wrong and how we can toggle switches to get it to do things correctly.
This was a quick stab at producing d3d11.dll and dxgi.dll files built with -fwhole-program
for evaluation purposes:
cd /path/to/dxvk
meson --unity on --cross-file build-win32.txt --prefix /tmp/dxvk-win32-whole-program build.w32
cd build.w32
meson configure -Dbuildtype=release
ninja
cat << END > dxgi.dll.c
#include "src/dxvk/src@dxvk@@dxvk@sta/dxvk-unity.cpp"
#include "src/dxgi/src@dxgi@@dxgi@sha/dxgi-unity.cpp"
#include "src/util/src@util@@util@sta/util-unity.cpp"
#include "src/spirv/src@spirv@@spirv@sta/spirv-unity.cpp"
#include "../src/util/sha1/sha1.c"
END
i686-w64-mingw32-g++ -fwhole-program -std=c++1z -O2 -g -o src/dxgi/dxgi.dll ../src/dxgi/dxgi.def -Wl,--no-undefined -Wl,--as-needed -shared ../src/dxgi/dxgi.def -Wl,--start-group -Wl,--out-implib=src/dxgi/libdxgi.dll.a dxgi.dll.c -I ../include -I ../src/dxvk -I ../src/dxgi -I . -I ../build.w32/src/dxvk/src@dxvk@@dxvk@sta/ -I ../build.w32/src/dxgi/src@dxgi@@dxgi@sha/ ../lib32/vulkan-1.lib -lkernel32 -luser32 -lgdi32 -lwinspool -lshell32 -lole32 -loleaut32 -luuid -lcomdlg32 -ladvapi32 -Wl,--end-group -static -static-libgcc -static-libstdc++ -Wl,--add-stdcall-alias,--enable-stdcall-fixup
cat << END > d3d11.dll.c
#include "src/d3d11/src@d3d11@@d3d11@sha/d3d11-unity.cpp"
#include "src/dxbc/src@dxbc@@dxbc@sta/dxbc-unity.cpp"
#include "src/dxvk/src@dxvk@@dxvk@sta/dxvk-unity.cpp"
#include "src/util/src@util@@util@sta/util-unity.cpp"
#include "src/spirv/src@spirv@@spirv@sta/spirv-unity.cpp"
#include "../src/util/sha1/sha1.c"
END
i686-w64-mingw32-g++ -std=c++1z -O2 -g -o src/d3d11/d3d11.dll ../src/d3d11/d3d11.def -Wl,--no-undefined -Wl,--as-needed -shared ../src/d3d11/d3d11.def -Wl,--start-group -Wl,--out-implib=src/d3d11/libd3d11.dll.a d3d11.dll.c -I ../include -I ../src/dxvk -I ../src/dxgi -I . -I ../build.w32/src/dxvk/src@dxvk@@dxvk@sta/ -I ../build.w32/src/dxgi/src@dxgi@@dxgi@sha/ ../lib32/vulkan-1.lib -ldxgi /home/richard/devel/dxvk/lib32/vulkan-1.lib -lkernel32 -luser32 -lgdi32 -lwinspool -lshell32 -lole32 -loleaut32 -luuid -lcomdlg32 -ladvapi32 -Wl,--end-group -static -static-libgcc -static-libstdc++ -Wl,--add-stdcall-alias,--enable-stdcall-fixup
ninja install
Contrary to my belief, setting always_visible was unnecessary. This was confirmed by quick examination of The Export Tables (interpreted .edata section contents)
via i686-w64-mingw32-objdump
, which showed that the same symbols were being exported, plus a quick runtime test.
I was able to replace the d3d11.dll and dxgi.dll files provided with proton and it ran without an problem. I am certain that I did replace the correct binary because DXVK_HUD=version is showing a changed version number.
It also might be of interest that the binaries built this way are smaller after stripping.
Before:
richard@desktop ~/devel/dxvk $ ls -l /tmp/dxvk-win32-v0.72-39-g20c89c3/{d3d11.dll,dxgi.dll}
-rwxr-xr-x 1 richard richard 2353664 Sep 23 00:26 /tmp/dxvk-win32-v0.72-39-g20c89c3/d3d11.dll
-rwxr-xr-x 1 richard richard 1845248 Sep 23 00:26 /tmp/dxvk-win32-v0.72-39-g20c89c3/dxgi.dll
After:
richard@desktop ~/devel/dxvk $ cp ./build.w32/src/d3d11/d3d11.dll ./build.w32/src/dxgi/dxgi.dll /tmp/
richard@desktop ~/devel/dxvk $ i686-w64-mingw32-strip /tmp/dxgi.dll /tmp/d3d11.dll
richard@desktop ~/devel/dxvk $ ls -l /tmp/dxgi.dll /tmp/d3d11.dll
-rwxr-xr-x 1 richard richard 1959936 Sep 24 12:50 /tmp/d3d11.dll
-rwxr-xr-x 1 richard richard 1414144 Sep 24 12:50 /tmp/dxgi.dll
I am sharing this in case someone else who has more time wants to test this to see if it helps. However, I suspect that doing this for release builds might be worthwhile for the smaller binary sizes, even if performance does not improve, as long as performance does not become worse. The build system would need to be fixed to avoid the horrible hack that I did to make the PoC though.
Do "Before" version was built using same flags, except -fwhole-program
?
-Dbuildtype=release
will add -O3
flag, so binaries expected to be bigger.
@pchome The before build was built like this:
meson --cross-file build-win32.txt --prefix /tmp/dxvk-win32
cd build.w32
meson configure -Dbuildtype=release
ninja
ninja install
I manually moved the files and stripped them afterward.
I didn't capture the CFLAGS being used for the build, so I didn't check. The size difference had been unexpected and was included in my comment at the last moment. A quick grep of the sources didn't show me any CFLAGS and I am not familiar with meson. However, I just tested -O3 builds out of curiosity:
richard@desktop ~/devel/dxvk $ ls -l /tmp/dxgi.dll /tmp/d3d11.dll
-rwxr-xr-x 1 richard richard 2098176 Sep 24 13:13 /tmp/d3d11.dll
-rwxr-xr-x 1 richard richard 1437184 Sep 24 13:13 /tmp/dxgi.dll
They are still smaller.
@doitsujin Why is your official release built with 2 different compilers?
richard@desktop /tmp $ strings dxvk-0.80/x32/d3d11.dll | grep GCC: | sort -u
GCC: (GNU) 4.9.2
GCC: (GNU) 8.1.0
I just checked configure phase, and unity
files available on this stage : build.64/src/d3d11/src@d3d11@@d3d11.dll@sha/
.
So it's possible to integrate -fwhole-program
into build process, by skipping some modules compilation.
I'll check this later.
@pchome It is not quite that simple because there are internal libraries being built. I had to work around that by making a manual unity file combining all of the unity files for those libraries to make it work.
I had to work around that by making a manual unity file combining all of the unity files for those libraries to make it work.
You can use generator
https://github.com/doitsujin/dxvk/blob/master/meson.build#L54
https://github.com/doitsujin/dxvk/blob/master/src/dxgi/meson.build#L19
or (maybe) pass all *_src variables to shared_library()
.
Also, it may be worth to ask https://github.com/mesonbuild/meson for such (-fwhole-program
+unity
) feature.
A hack for -fwhole-program
(for testing purpose)
sha1/sha1.c
to sha1/sha1.cpp
-Dwhole_program=true
by default, so --unity on
required-fwhole-program
, otherwise produces errors like dxgi.spec:1: function 'CreateDXGIFactory' not defined
@pchome Were you able to reproduce the smaller binaries?
I have no MinGW installed, and as I said I can't use -fwhole-program
with this patch for winelib build.
W/o -fwhole-program
almost (different version) equal sized files was generated, compared to those currently installed in system.
This produces two huge d3d11.dll-unity.cpp
and dxgi.dll-unity.cpp
, but automatically.
In my observations -flto have advantage over unity builds (not much, but still). This because of exported (non-static) symbols must have an ABI in unity build. Compiler can't change it because he do not know if someone wants to call this symbol from resulting object. With -flto compiler makes decision when actually links application, so he can change ABI if he wants. Not sure if we can bypass it with -fvisibility=hidden.
@lieff -fwhole-program
tells the toolchain to assume that no one will ever want to call those externally.
Here are instructions for a 64-bit build of the proof of concept that I posted earlier:
cd /path/to/dxvk
meson --unity on --cross-file build-win64.txt --prefix /tmp/dxvk-win64-whole-program build.w64
cd build.w64
meson configure -Dbuildtype=release
ninja
cat << END > dxgi.dll.c
#include "src/dxvk/src@dxvk@@dxvk@sta/dxvk-unity.cpp"
#include "src/dxgi/src@dxgi@@dxgi@sha/dxgi-unity.cpp"
#include "src/util/src@util@@util@sta/util-unity.cpp"
#include "src/spirv/src@spirv@@spirv@sta/spirv-unity.cpp"
#include "../src/util/sha1/sha1.c"
END
x86_64-w64-mingw32-g++ -fwhole-program -std=c++1z -O2 -g -o src/dxgi/dxgi.dll ../src/dxgi/dxgi.def -Wl,--no-undefined -Wl,--as-needed -shared ../src/dxgi/dxgi.def -Wl,--start-group -Wl,--out-implib=src/dxgi/libdxgi.dll.a dxgi.dll.c -I ../include -I ../src/dxvk -I ../src/dxgi -I . -I ../build.w64/src/dxvk/src@dxvk@@dxvk@sta/ -I ../build.w64/src/dxgi/src@dxgi@@dxgi@sha/ ../lib/vulkan-1.lib -lkernel32 -luser32 -lgdi32 -lwinspool -lshell32 -lole32 -loleaut32 -luuid -lcomdlg32 -ladvapi32 -Wl,--end-group -static -static-libgcc -static-libstdc++ -Wl,--add-stdcall-alias,--enable-stdcall-fixup
cat << END > d3d11.dll.c
#include "src/d3d11/src@d3d11@@d3d11@sha/d3d11-unity.cpp"
#include "src/dxbc/src@dxbc@@dxbc@sta/dxbc-unity.cpp"
#include "src/dxvk/src@dxvk@@dxvk@sta/dxvk-unity.cpp"
#include "src/util/src@util@@util@sta/util-unity.cpp"
#include "src/spirv/src@spirv@@spirv@sta/spirv-unity.cpp"
#include "../src/util/sha1/sha1.c"
END
x86_64-w64-mingw32-g++ -std=c++1z -O2 -g -o src/d3d11/d3d11.dll ../src/d3d11/d3d11.def -Wl,--no-undefined -Wl,--as-needed -shared ../src/d3d11/d3d11.def -Wl,--start-group -Wl,--out-implib=src/d3d11/libd3d11.dll.a d3d11.dll.c -I ../include -I ../src/dxvk -I ../src/dxgi -I . -I ../build.w64/src/dxvk/src@dxvk@@dxvk@sta/ -I ../build.w64/src/dxgi/src@dxgi@@dxgi@sha/ ../lib/vulkan-1.lib -ldxgi /home/richard/devel/dxvk/lib32/vulkan-1.lib -lkernel32 -luser32 -lgdi32 -lwinspool -lshell32 -lole32 -loleaut32 -luuid -lcomdlg32 -ladvapi32 -Wl,--end-group -static -static-libgcc -static-libstdc++ -Wl,--add-stdcall-alias,--enable-stdcall-fixup
ninja install
I noticed a small mistake in how I was doing the -fwhole-program
build of d3d11.dll where src/dxvk/src@dxvk@@dxvk@sta/dxvk-unity.cpp
was built separately. That kept the binaries from being as well optimized / small as they could have been. I have updated the build instructions and size data with the corrections.
@ryao Yes, it should turn all to static. But it's my old observation that it does not work for C++ member functions (may be because C++ methods can't be static in terms of С++). May be now newer gcc behaves differently, I've look at it long time ago and I've not checked clang.
@ryao
I noticed a small mistake ...
You also should use --unity on
for both, so we could see the real -fwhole-program
difference/advantage.
Edit: And -O3
@lieff I am building with GCC 8.2.0. I have yet to try Clang
@pchome I am using --unity on
for both.
Also, I have proof of concept builds available for those interested in testing them:
wget http://dev.gentoo.org/~ryao/dist/dxvk-win64-v0.80-whole-program.txz{,.sig}
gpg --verify dxvk-win64-v0.80-whole-program.txz{.sig,}
wget http://dev.gentoo.org/~ryao/dist/dxvk-win32-v0.80-whole-program.txz{,.sig}
gpg --verify dxvk-win32-v0.80-whole-program.txz{.sig,}
I already found a volunteer to help test, so there is no need for more, but I'm posting them for the various contributors here to be able to evaluate. You can get my PGP key from github:
I included the d3d10 dlls, but they aren't built with -fwhole-program
, so there is really nothing special about them beyond being unity builds built with GCC 8.2.0. These binaries have been stripped to save space on the webserver. Also, despite my instructions showing -O2, all of the binaries were built with -O3 to match the release builds. I left the instructions with -O2 to avoid making the history confusing.
@pchome I just realized that you meant before/after. That will need to wait a few days because I have spent all of my spare time on this and then some, but I will be happy to provide numbers for them when I have some more time.
I'm still want to do my own build/test, no luck.
Here is final options passed to compiler by winegcc
:
COLLECT_GCC_OPTIONS='-fdiagnostics-color=always' '-fshort-wchar' '-D' 'WINE_UNICODE_NATIVE' \
'-D' '_REENTRANT' '-D' 'WIN64' '-D' '_WIN64' '-D' '__WIN64' '-D' '__WIN64__' '-D' 'WIN32' \
'-D' '_WIN32' '-D' '__WIN32' '-D' '__WIN32__' '-D' '__WINNT' '-D' '__WINNT__' \
'-D' '__stdcall=__attribute__((ms_abi))' '-D' '__cdecl=__attribute__((ms_abi))' \
'-D' '_stdcall=__attribute__((ms_abi))' '-D' '_cdecl=__attribute__((ms_abi))' \
'-D' '__fastcall=__attribute__((ms_abi))' '-D' '_fastcall=__attribute__((ms_abi))' \
'-D' '__declspec(x)=__declspec_##x' '-D' '__declspec_align(x)=__attribute__((aligned(x)))' \
'-D' '__declspec_allocate(x)=__attribute__((section(x)))' \
'-D' '__declspec_deprecated=__attribute__((deprecated))' \
'-D' '__declspec_dllimport=__attribute__((dllimport))' \
'-D' '__declspec_dllexport=__attribute__((dllexport))' \
'-D' '__declspec_naked=__attribute__((naked))' \
'-D' '__declspec_noinline=__attribute__((noinline))' \
'-D' '__declspec_noreturn=__attribute__((noreturn))' \
'-D' '__declspec_nothrow=__attribute__((nothrow))' \
'-D' '__declspec_novtable=__attribute__(())' \
'-D' '__declspec_selectany=__attribute__((weak))' \
'-D' '__declspec_thread=__thread' \
'-D' '__int8=char' '-D' '__int16=short' '-D' '__int32=int' '-D' '__int64=long' '-D' '__WINE__' \
'-c' '-o' 'src/dxgi/src@dxgi@@dxgi.dll@sha/meson-generated_dxgi.dll-unity.cpp.o' \
'-I' 'src/dxgi/src@dxgi@@dxgi.dll@sha' '-I' 'src/dxgi' '-I' '../../dxvk/src/dxgi' \
'-I' '../../dxvk/./include' '-I' 'src/dxvk' '-I' '../../dxvk/src/dxvk' '-I' '.' \
'-pipe' '-D' '_FILE_OFFSET_BITS=64' '-Wall' \
'-Winvalid-pch' '-Wnon-virtual-dtor' '-std=c++17' '-O3' '-D' 'NOMINMAX' \
'-fwhole-program' \
'-fPIC' '-pthread' '-m64' '-Wno-attributes' '-march=native' \
'-O3' '-fgraphite-identity' '-floop-nest-optimize' \
'-MD' '-MQ' 'src/dxgi/src@dxgi@@dxgi.dll@sha/meson-generated_dxgi.dll-unity.cpp.o' \
'-MF' 'src/dxgi/src@dxgi@@dxgi.dll@sha/meson-generated_dxgi.dll-unity.cpp.o.d' \
'-v' '-isystem' '/usr/include/wine-vanilla-3.16/wine/windows' '-shared-libgcc'
Maybe there is some attributes I should change, before -fwhole-program
use?
My variant uses compile and link steps, not sure if it can be changed for winelib build. Because different jobs should be done by winegcc/winebuild itself, before something actually compiled/linked.
Is it emitting errors? I would need to see at least some of them to be able to guess what is wrong.
I want winelib builds too, although my winelib builds don’t work for me yet, so I need to resolve that before I even try.
using -v
to explain all commands:
wineg++ -v -o src/dxgi/dxgi.dll.so ../../dxvk/src/dxgi/dxgi.spec 'src/dxgi/src@dxgi@@dxgi.dll@sha/meson-generated_dxgi.dll-unity.cpp.o' -Wl,--no-undefined -Wl,--as-needed -Wl,-O1 -shared -fPIC -Wl,--start-group -Wl,-soname,dxgi.dll.so -lwinevulkan -Wl,--end-group -pthread -m64 -mwindows
winebuild -v -fno-asynchronous-unwind-tables --cc-cmd=x86_64-pc-linux-gnu-gcc -m64 --ld-cmd=x86_64-pc-linux-gnu-ld -m64 -D_REENTRANT -fPIC --dll -o dxgi.dll-4y5sAd.spec.o -E ../../dxvk/src/dxgi/dxgi.spec -L/usr/lib64/wine-vanilla-3.16/wine -L/usr/lib64/wine-vanilla-3.16 -- src/dxgi/src@dxgi@@dxgi.dll@sha/meson-generated_dxgi.dll-unity.cpp.o /usr/lib64/wine-vanilla-3.16/wine/libwinevulkan.def /usr/lib64/wine-vanilla-3.16/wine/libshell32.def /usr/lib64/wine-vanilla-3.16/wine/libcomdlg32.def /usr/lib64/wine-vanilla-3.16/wine/libgdi32.def /usr/lib64/wine-vanilla-3.16/wine/libadvapi32.def /usr/lib64/wine-vanilla-3.16/wine/libuser32.def /usr/lib64/wine-vanilla-3.16/wine/libwinecrt0.a /usr/lib64/wine-vanilla-3.16/wine/libkernel32.def /usr/lib64/wine-vanilla-3.16/wine/libntdll.def
x86_64-pc-linux-gnu-gcc -m64 -xassembler -c -m64 -o dxgi.dqLQXl.o dxgi.YIr7Yg.s
x86_64-pc-linux-gnu-ld -m elf_x86_64 -r -o dxgi.qyfoFq.o dxgi.dqLQXl.o src/dxgi/src@dxgi@@dxgi.dll@sha/meson-generated_dxgi.dll-unity.cpp.o /usr/lib64/wine-vanilla-3.16/wine/libwinecrt0.a
../../dxvk/src/dxgi/dxgi.spec:1: function 'CreateDXGIFactory' not defined
../../dxvk/src/dxgi/dxgi.spec:2: function 'CreateDXGIFactory1' not defined
../../dxvk/src/dxgi/dxgi.spec:3: function 'CreateDXGIFactory2' not defined
winegcc: winebuild failed
Someone with time to explore tweaks to the build system should look into doing PGO builds. There are descriptions of how this works here:
https://dom.as/2009/07/27/profile-guided-optimization-with-gcc/ https://gcc.gnu.org/onlinedocs/gcc-8.2.0/gcc/Instrumentation-Options.html
There are glowing reviews of PGO here:
https://cboard.cprogramming.com/tech-board/111902-pgo-amazing.html https://www.activestate.com/blog/2014/06/python-performance-boost-using-profile-guided-optimization https://clearlinux.org/blogs/profile-guided-optimization-mariadb-benchmarks
I suspect that PGO might help to reduce "stutter".
There are a couple of questions that need to be answered before PGO builds can be done: