Open ymanton opened 3 years ago
Intel is in the process of deprecating their TSX support as well due to memory ordering problems causing issues. The feature has been removed in some Ice Lake processors and Intel microcode updates are beginning to appear to disable TSX function and force-abort RTM transactions on older processors (e.g., back to Skylake) [1][2]. This doesn't affect AMD as it never supported it.
I propose removing TSX support in the x86 backend as a result (no customer impact is expected). We never realized any known benefit from it.
However, I wonder if this issue should be generalized to remove transactional memory support altogether from the JIT. Does Z realize any benefit with TM that we know of? @joransiu
FYI @zl-wang @mstoodle @vijaysun-omr
[1] https://cdrdv2.intel.com/v1/dl/getContent/604224 [2] https://www.phoronix.com/scan.php?page=news_item&px=Intel-TSX-Off-New-Microcode
However, I wonder if this issue should be generalized to remove transactional memory support altogether from the JIT. Does Z realize any benefit with TM that we know of? @joransiu
Personally I think we should remove it. We do use TM though to accelerate ConcurrentLinkedQueue
but only at a recognized method level. I have yet to see a tbegin
/ tend
IL using in the wild. It seems prohibitively limited (in what IL can be part of a TM sequence) that we can almost never find a sequence of trees to actually perform TM reductions on.
Somewhat timely discussion. We collected a set of Liberty DayTrader results specifically to measure with and without TX support on z/OS. From profiles, we do spent 1% of execution time within Liberty inside TX transactions. However, the throughput, unsurprisingly, was pretty within 1% of each other -- so par performance in practical terms.
The TX support does come into play for smaller multi-threaded applications and benchmarks. A recent example where it showed up was from crypto.perf, where we do see TX usage in both the bumblebench related methods (I believe doBatch()`, as well as, in some of the crypto implementations (i.e. AESCipher.engineUpdate).
I do wonder if we have other key application scenarios (i.e. some of the contention observed in class loader) whether TX may help or not. It's one of those technologies where generally it may not help, but in the right situation, it can buy us much better scalability.
I agree the potential is (and has always been) there, but I'm not aware of a benefit running a real workload. It may be useful to survey the landscape as Joran suggests (I know, for example, there have been thoughts around using TM to help with contended lock scenarios).
However, given the direction of the other hardware vendors it would seem that TM exploitation will be a Z-only feature for the immediate future.
A statement of direction was given on IBM Z:
Removal of support of the transactional execution and constrained transactional execution facility: In a future IBM Z hardware system family, the transactional execution and constrained transactional execution facility will no longer be supported. Users of the facility on current servers should always check the facility indications before use.
Transactional memory is no longer supported on IBM Power processors (which were the only PPC processors to ever support it) as of P10. TM has also been disabled by default in OpenJ9 for a long time and some of the functionality relied on classes that don't/won't exist in OpenJ9 so this code is effectively dead and untested and should be removed unless there's good reason not to.
See below for context:
_Originally posted by @ymanton in https://github.com/eclipse-openj9/openj9/pull/13307#discussion_r687924452_