Open knn-k opened 9 months ago
It seems the OMR_GC_TLH_PREFETCH_FTA
value in the generated omrcfg.h reflects the J9VM_GC_TLH_PREFETCH_FTA
value, as far as I tried on AArch64 Linux turning the J9VM_
flag on and off.
What happens on linux_ppc-64_le, where J9VM_GC_TLH_PREFETCH_FTA
is ON
and OMR_GC_TLH_PREFETCH_FTA
is OFF
?
Can anybody look at OMR_GC_TLH_PREFETCH_FTA
in the omrcfg.h for linux_ppc-64_le, please?
OMR_GC_TLH_PREFETCH_FTA
is defined in the generated omrcfg.h when J9VM_GC_TLH_PREFETCH_FTA
is ON and OMR_GC_TLH_PREFETCH_FTA
is OFF in cmake files. J9VM_GC_TLH_PREFETCH_FTA
setting seems to override OMR_GC_TLH_PREFETCH_FTA
.
I think setting the OMR_GC_TLH_PREFETCH_FTA
value in cmake files makes it confusing.
How about removing OMR_GC_TLH_PREFETCH_FTA
lines from cmake files?
@zl-wang Do you have any thoughts on:
OMR_GC_TLH_PREFETCH_FTA
to OFF while J9VM_GC_TLH_PREFETCH_FTA
is ONJ9VM_GC_TLH_PREFETCH_FTA
to OFFlinux_ppc-64_le: https://github.com/eclipse-openj9/openj9/blob/edbf0e1a5bce0b062eba57a98c6fe3e438a49d5c/runtime/cmake/caches/linux_ppc-64_le.cmake#L42
AArch64 platforms (Linux and macOS): J9VM_GC_TLH_PREFETCH_FTA
is enabled, but the JIT compiler does not use the tlhPrefetchFTA
field in J9VMThread
at all.
It looks to me that it is off in all cases: (my search results below) FILE: /team/zlwang/repositories/openj9/runtime/cmake/caches/aix_ppc-64.cmake
set(J9VM_GC_TLH_PREFETCH_FTA OFF CACHE BOOL "")
FILE: /team/zlwang/repositories/openj9/runtime/cmake/caches/linux_ppc-64_le.cmake
set(OMR_GC_TLH_PREFETCH_FTA OFF CACHE BOOL "")
It is indeed confusing. Why it is even relevant to OMR for a prefetch scheme dependent on J9VMThread support. In my opinion, this should not be visible in OMR side.
It is indeed confusing. Why it is even relevant to OMR for a prefetch scheme dependent on J9VMThread support. In my opinion, this should not be visible in OMR side.
I think it was an attempt to provide definition in OMR because GC TLH code lives there (in combination with desire to keep OMR independent upstream project). So, either we need to keep OMR* and J9VM* settings synchronized or use compromise and check J9VM_GC_TLH_PREFETCH_FTA
in OMR code directly. Quick search shows we do have J9 definitions usage in OMR code already.
@zl-wang Note that the symbol in aixppc-64.cmake starts with `J9VM, and the one in linux_ppc-64_le.cmake starts with
OMR_`. They are different symbols.
Please also look at the definitions of those symbols in j9cfg.h and omrcfg.h that are generated during the build.
The following files in buildspecs set the gc_tlhPrefetchFTA
flag:
linux_ppc-64_le.spec and linux_390-64.spec don't set the flag, which are inconsistent with the setting in common.cmake.
Looking back the histories of .cmake files:
Power
J9VM_GC_TLH_PREFETCH_FTA
was turned off by #9142.OMR_GC_TLH_PREFETCH_FTA
was turned off by the initial cmake file for plinux #2566. J9VM_GC_TLH_PREFETCH_FTA
is enabled by common.cmake.Z
J9VM_GC_TLH_PREFETCH_FTA
is enabled by common.cmake.@joransiu FYI.
I ran personal builds to see if OMR_GC_TLH_PREFETCH_FTA
is defined or not on Power and Z. AIX, z/OS, and Linux have different values as shown below.
OFF
, but that is ignored)(internal job: view/OpenJ9%20-%20Personal/job/Pipeline-Build-Test-Personal/20537/)
we don't need to set the macro(s) consistently across platforms, but of course setting them consistently on each single platform is a good step to take.
In order to realize the performance benefit of TLH-prefetch which has been dropped out of favour gradually in the past while, you need to walk a fine line anyway between micro-architecture and codegen. firstly, if you can gain benefit from TLH batch clearing which is default on p at least, TLH-prefetch is an overlapping action by and large. secondly, prefetch instructions with temporal characteristics were the ones having performance benefits from past experiences (since hardware prefetch does pretty much the same thing as explicit vanilla prefetch instructions and without consuming instruction cache capacity).
@zl-wang I can change PR #18918 not to disable J9VM_GC_TLH_PREFETCH_FTA
in linux_ppc-64_le.cmake. Is that what you mean, as the first step?
if you can gain benefit from TLH batch clearing which is default on p at least, TLH-prefetch is an overlapping action by and large
Good point. As I understand we are about to enable TLH batch clearing on X and ARM. So, relation between TLH batch clearing and TLH-prefetch should be considered.
I updated PR #18918. It does not change J9VM_GC_TLH_PREFETCH_FTA
in linux_ppc-64_le.cmake and in linux_390-64.cmake now.
See J9VM_GC_TLH_PREFETCH_FTA
settings after the change. The flag is turned off only on aix and zos.
Platform | J9VM_GC_TLH_PREFETCH_FTA |
---|---|
x86 and x86-64 platforms | ON (common) |
aarch64 platforms | ON (common) |
aix_ppc-64 | OFF |
linux_ppc-64_le | ON (common) |
zos_390-64 | OFF |
linux_390-64 | ON (common) |
As I understand we are about to enable TLH batch clearing on X and ARM.
TLH batch clearing is enabled by default on AArch64.
common.cmake enables J9VM_GC_BATCH_CLEAR_TLH
on all the platforms.
https://github.com/eclipse-openj9/openj9/blob/a3486691d74018b28f9631a97ab08401d56d3025/runtime/cmake/caches/common.cmake#L121
On x86 platforms, you need to set the env variable TR_EnableBatchClear
to allocate zeroed TLH pages. Other platforms have the env variable TR_DisableBatchClear
.
https://github.com/eclipse-openj9/openj9/blob/a3486691d74018b28f9631a97ab08401d56d3025/runtime/compiler/control/DLLMain.cpp#L359-L360
As I understand we are about to enable TLH batch clearing on X and ARM.
TLH batch clearing is enabled by default on AArch64. common.cmake enables
J9VM_GC_BATCH_CLEAR_TLH
on all the platforms.On x86 platforms, you need to set the env variable
TR_EnableBatchClear
to allocate zeroed TLH pages. Other platforms have the env variableTR_DisableBatchClear
.
I believe Julian's point is: using TLH Batch clearing can make TLH Prefetching less effective significantly. Another words if we zero every TLH memory up front prefetching might be not necessary.
My points are as follows:
OMR_GC_TLH_PREFETCH_FTA
in .cmake files is useless and confusing. J9VM_GC_TLH_PREFETCH_FTA
overriddes it during the build. PR #18918 fixes it.J9VM_GC_TLH_PREFETCH_FTA
settings between Linux (ON) and AIX (OFF). Is there any reason for that?J9VM_GC_TLH_PREFETCH_FTA
settings between Linux (ON) and z/OS (OFF). Is there any reason for that?gc_tlhPrefetchFTA
is disabled in UMA builds for pLinux and zLinux. They are inconsistent with CMake builds for those platforms.I would like someone to review #18918.
Additional finding:
AIX JIT accesses J9VMThread->tlhPrefetchFTA
, but the field is not updated by the GC side because J9VM_GC_TLH_PREFETCH_FTA
is disabled on AIX.
p codegen accessing tlhPrefetchFTA
: (There are some other locations)
https://github.com/eclipse-openj9/openj9/blob/1690101f1cbba4e8e24d77bfeae420b3991cdf76/runtime/compiler/p/codegen/J9TreeEvaluator.cpp#L5918-L5919
GC code updating tlhPrefetchFTA
: (There are some other locations)
https://github.com/eclipse-openj9/openj9/blob/1690101f1cbba4e8e24d77bfeae420b3991cdf76/runtime/gc_include/ObjectAllocationAPI.hpp#L172-L174
PR #18918 has been merged. J9VM_GC_TLH_PREFETCH_FTA
and OMR_GC_TLH_PREFETCH_FTA
have consistent values on every platform now.
It is up to Power and Z people to investigate the differences between AIX and pLinux, z/OS and zLinux, respectively.
There are flags
OMR_GC_TLH_PREFETCH_FTA
andJ9VM_GC_TLH_PREFETCH_FTA
. Settings of these flags in cmake files are complicated, and look inconsistent for some platforms.runtime/cmake/caches/common.cmake enables
J9VM_GC_TLH_PREFETCH_FTA
. The file does not have an entry forOMR_GC_TLH_PREFETCH_FTA
.https://github.com/eclipse-openj9/openj9/blob/ac66d8cf7c584ff003bfbe8c44d1a79330c3036e/runtime/cmake/caches/common.cmake#L149