Open patricklodder opened 1 month ago
Hi,
I ran bench_dogecoin
for this PR and original 1.15.0-dev branch on my Dell XPS Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
system without explicitly enabling scrypt-sse2
support.
For this PR I've got for scrypt:
Scrypt,2048,0.000507116317749,0.000533625483513,0.000520408619195,1119821,1178271,1149121
For original 1.15.0-dev benchmark run I've got:
Scrypt,4096,0.000255910679698,0.000260507687926,0.000257457257248,565081,575221,568493
Judging by the results (and skimming over the code), my understanding is that this PR detects and automatically sets SSE2 CPU features if they are supported by hardware? This is really awesome! Does this mean sse2 and avx2 features can be regarded as standard and not that much "experimental" features anymore? The reason why I ask is in my net-p2p/dogecoin-qt I make it a requirement to specify experimental
use flag as part of avx2
and scrypt-sse2
selection options (to model our current requirement for these features.
However, if CPU supports such features as avx2
and/or sse2
, it is more preferable in Gentoo ecosystem to automatically detect and set CPU features during package installation without explicitly specifying them with USE flag (something similar to what this PR aims at doing), provided the user correctly specifies them for CPU_FLAGS_X86
variable in make.conf
. Does this mean I can modify net-p2p/dogecoin-qt
ebuild to automatically use sse2
and avx2
during package installation without experimental
requirement?
Thank you, Victor.
Thank you.
without explicitly enabling
scrypt-sse2
support
This PR does not change the experimental-ness of scrypt-sse2
and you should still pass --enable-experimental --enable-scrypt-sse2
For this PR I've got [..]
Those are odd results, could you please attach your config.log
for both versions?
Judging by the results (and skimming over the code), my understanding is that this PR detects and automatically sets SSE2 CPU features if they are supported by hardware?
It already did that before - I just made the method available generically.
However, if CPU supports such features as avx2 and/or sse2, it is more preferable in Gentoo ecosystem to automatically detect and set CPU features during package installation
We can support that once we take this out of experimental. However, the results you posted are not consistent with what I would expect!
Hi,
Thank you for clarifying. Attaching both config.log files as requested. config_3512.log config_orig.log
I reviewed the logs... your result is a 2x worse performance when compiling without SSE2 enabled. I will try to reproduce these results, because that would be an unintended effect.
Hi,
Just a quick note that I also ran test on another Linux installation in VirtualBox and got following compilation error:
sync.cpp:60:14: error: ‘set’ in namespace ‘std’ does not name a template type 60 | typedef std::set<std::pair<void*, void*> > InvLockOrders; | ^~~ sync.cpp:14:1: note: ‘std::set’ is defined in header ‘<set>’; did you forget to ‘#include <set>’? 13 | #include <boost/thread.hpp> +++ |+#include <set> 14 | sync.cpp:72:5: error: ‘InvLockOrders’ does not name a type; did you mean ‘LockOrders’? 72 | InvLockOrders invlockorders; | ^~~~~~~~~~~~~ | LockOrders sync.cpp: In function ‘void push_lock(void*, const CLockLocation&, bool)’: sync.cpp:123:18: error: ‘struct LockData’ has no member named ‘invlockorders’; did you mean ‘lockorders’? 123 | lockdata.invlockorders.insert(p2); | ^~~~~~~~~~~~~ | lockorders sync.cpp: In function ‘void DeleteLock(void*)’: sync.cpp:172:18: error: ‘struct LockData’ has no member named ‘invlockorders’; did you mean ‘lockorders’? 172 | lockdata.invlockorders.erase(invitem); | ^~~~~~~~~~~~~ | lockorders sync.cpp:175:5: error: ‘InvLockOrders’ has not been declared 175 | InvLockOrders::iterator invit = lockdata.invlockorders.lower_bound(item); | ^~~~~~~~~~~~~ sync.cpp:176:12: error: ‘invit’ was not declared in this scope; did you mean ‘int’? 176 | while (invit != lockdata.invlockorders.end() && invit->first == cs) { | ^~~~~ | int sync.cpp:176:30: error: ‘struct LockData’ has no member named ‘invlockorders’; did you mean ‘lockorders’? 176 | while (invit != lockdata.invlockorders.end() && invit->first == cs) { | ^~~~~~~~~~~~~ | lockorders sync.cpp:179:18: error: ‘struct LockData’ has no member named ‘invlockorders’; did you mean ‘lockorders’? 179 | lockdata.invlockorders.erase(invit++); | ^~~~~~~~~~~~~ | lockorders make[2]: *** [Makefile:6600: libdogecoin_util_a-sync.o] Error 1 make[2]: Leaving directory '/home/me/projects/fromgit/dogecoin/PR-3512/dogecoin/src' make[1]: *** [Makefile:9882: all-recursive] Error 1 make[1]: Leaving directory '/home/me/projects/fromgit/dogecoin/PR-3512/dogecoin/src' make: *** [Makefile:695: all-recursive] Error 1
This error was fixed by adding #include <set>
in sync.cpp
(as message advises). I ran bench_dogecoin
on that installation and got this for Scrypt:
Scrypt,1792,0.000569975003600,0.000772126019001,0.000594207617853,1258510,1704819,1312008
@victorsk2019 2 notes:
--enable-debug
may not be optimal when doing benchmarks, as that compiles with -O0
(and your make log must be full of warnings if you do this on Gentoo). Without it, you also do not get the include error.--enable-c++14
instead of adding the CXX flag; you can see in your config.log that it adds 2 c++ standard instructions, but only does checks for c++11. (And after #3494 goes in, I will also propose --enable-c++17
.)Thank you for additional information. I've recompiled this PR's code without --enable-debug
and got the following bench results for Scrypt (this is probably what was expected):
Scrypt,4096,0.000250319950283,0.000264337286353,0.000253840873484,552764,583719,560546
Maybe had something to do with mutexes locking processes threads causing slower benchmark performance 😕
Because we want to check for hardware capabilities at runtime among a set of current experimental features when they mature, it's good to have a uniform, central facility that checks capabilities. Detection code is riddled with macros making it hard to review/debug, because for each architecture/compiler there can be quirks.
This PR does 3 things:
compat/cpuid.h
from Bitcoin Core, to provide a generic, low level interface for reading out x86 CPU properties. This helps us in the future outside of crypto algorithms too, because we'll need it for RNG enhancements; see for example this finding on a proposedRDSEED
implementation.crypto/hwcap.*
, where all capabilities can be tested (once), and decoupled from implementation selectors. It can be easily extended by enumerating new booleans into theHardwareCapabilities
struct, which can then be detected in theDetectHWCapabilities()
function. The values in this struct can also be overridden ininit.cpp
in the future, for example: there are plenty products where you would not want to run RDSEED or AVX2 on because either the implementation is terribly slow or has serious security flaws. This construct allows us to create a facility that turns those off using a startup argument that overrides the detection. This would allow users to simply run a released binary but runtime override what features to use, sparing them the need for compile expertise.Testing
I think this needs a couple of specific tests that we don't have a CI for: