Inconsistent cmake settings for TLH_PREFETCH_FTA

knn-k commented 9 months ago

There are flags OMR_GC_TLH_PREFETCH_FTA and J9VM_GC_TLH_PREFETCH_FTA. Settings of these flags in cmake files are complicated, and look inconsistent for some platforms.

runtime/cmake/caches/common.cmake enables J9VM_GC_TLH_PREFETCH_FTA. The file does not have an entry for OMR_GC_TLH_PREFETCH_FTA.

https://github.com/eclipse-openj9/openj9/blob/ac66d8cf7c584ff003bfbe8c44d1a79330c3036e/runtime/cmake/caches/common.cmake#L149

Platform	OMR_GC_TLH_PREFETCH_FTA	J9VM_GC_TLH_PREFETCH_FTA
linux_x86-64	ON (explicit)	ON (common)
aix_ppc-64	(no setting)	OFF (explicit)
linux_ppc-64_le	OFF (explicit)	ON (common)
zos_390-64	OFF (explicit)	OFF (explicit)
linux_390-64	(no setting)	ON (common)
linux_aarch64	(no setting)	ON (common)

knn-k commented 9 months ago

It seems the OMR_GC_TLH_PREFETCH_FTA value in the generated omrcfg.h reflects the J9VM_GC_TLH_PREFETCH_FTA value, as far as I tried on AArch64 Linux turning the J9VM_ flag on and off.

What happens on linux_ppc-64_le, where J9VM_GC_TLH_PREFETCH_FTA is ON and OMR_GC_TLH_PREFETCH_FTA is OFF? Can anybody look at OMR_GC_TLH_PREFETCH_FTA in the omrcfg.h for linux_ppc-64_le, please?

knn-k commented 9 months ago

OMR_GC_TLH_PREFETCH_FTA is defined in the generated omrcfg.h when J9VM_GC_TLH_PREFETCH_FTA is ON and OMR_GC_TLH_PREFETCH_FTA is OFF in cmake files. J9VM_GC_TLH_PREFETCH_FTA setting seems to override OMR_GC_TLH_PREFETCH_FTA.

I think setting the OMR_GC_TLH_PREFETCH_FTA value in cmake files makes it confusing. How about removing OMR_GC_TLH_PREFETCH_FTA lines from cmake files?

knn-k commented 9 months ago

@zl-wang Do you have any thoughts on:

Why linux_ppc-64_le sets OMR_GC_TLH_PREFETCH_FTA to OFF while J9VM_GC_TLH_PREFETCH_FTA is ON
Why aix_ppc-64 sets J9VM_GC_TLH_PREFETCH_FTA to OFF

linux_ppc-64_le: https://github.com/eclipse-openj9/openj9/blob/edbf0e1a5bce0b062eba57a98c6fe3e438a49d5c/runtime/cmake/caches/linux_ppc-64_le.cmake#L42

aix_ppc-64: https://github.com/eclipse-openj9/openj9/blob/edbf0e1a5bce0b062eba57a98c6fe3e438a49d5c/runtime/cmake/caches/aix_ppc-64.cmake#L34

knn-k commented 9 months ago

AArch64 platforms (Linux and macOS): J9VM_GC_TLH_PREFETCH_FTA is enabled, but the JIT compiler does not use the tlhPrefetchFTA field in J9VMThread at all.

zl-wang commented 9 months ago

It looks to me that it is off in all cases: (my search results below) FILE: /team/zlwang/repositories/openj9/runtime/cmake/caches/aix_ppc-64.cmake

set(J9VM_GC_TLH_PREFETCH_FTA OFF CACHE BOOL "") FILE: /team/zlwang/repositories/openj9/runtime/cmake/caches/linux_ppc-64_le.cmake

set(OMR_GC_TLH_PREFETCH_FTA OFF CACHE BOOL "")

It is indeed confusing. Why it is even relevant to OMR for a prefetch scheme dependent on J9VMThread support. In my opinion, this should not be visible in OMR side.

dmitripivkine commented 9 months ago

It is indeed confusing. Why it is even relevant to OMR for a prefetch scheme dependent on J9VMThread support. In my opinion, this should not be visible in OMR side.

I think it was an attempt to provide definition in OMR because GC TLH code lives there (in combination with desire to keep OMR independent upstream project). So, either we need to keep OMR* and J9VM* settings synchronized or use compromise and check J9VM_GC_TLH_PREFETCH_FTA in OMR code directly. Quick search shows we do have J9 definitions usage in OMR code already.

knn-k commented 9 months ago

@zl-wang Note that the symbol in aixppc-64.cmake starts with `J9VM, and the one in linux_ppc-64_le.cmake starts withOMR_`. They are different symbols. Please also look at the definitions of those symbols in j9cfg.h and omrcfg.h that are generated during the build.

knn-k commented 9 months ago

The following files in buildspecs set the gc_tlhPrefetchFTA flag:

linux_aarch64
linux_arm
linux_riscv64
linux_x86-64
linux_x86
osx_x86-64
win_x86-64
win_x86

linux_ppc-64_le.spec and linux_390-64.spec don't set the flag, which are inconsistent with the setting in common.cmake.

knn-k commented 9 months ago

Looking back the histories of .cmake files:

Power

aix_ppc-64.cmake: J9VM_GC_TLH_PREFETCH_FTA was turned off by #9142.
linux_ppc-64_le.cmake: OMR_GC_TLH_PREFETCH_FTA was turned off by the initial cmake file for plinux #2566. J9VM_GC_TLH_PREFETCH_FTA is enabled by common.cmake.

Z

zos_390-64.cmake: The two symbols are turned off by the initial cmake file for zos by #11994
linux_390-64.cmake: No specific settings for the two symbols. J9VM_GC_TLH_PREFETCH_FTA is enabled by common.cmake.

knn-k commented 9 months ago

@joransiu FYI.

knn-k commented 9 months ago

I ran personal builds to see if OMR_GC_TLH_PREFETCH_FTA is defined or not on Power and Z. AIX, z/OS, and Linux have different values as shown below.

ppc64_aix: Undefined
ppc64le_linux: Defined (The cmake file sets it to OFF, but that is ignored)
s390x_linux: Defined
s390x_zos: Undefined

(internal job: view/OpenJ9%20-%20Personal/job/Pipeline-Build-Test-Personal/20537/)

zl-wang commented 9 months ago

we don't need to set the macro(s) consistently across platforms, but of course setting them consistently on each single platform is a good step to take.

In order to realize the performance benefit of TLH-prefetch which has been dropped out of favour gradually in the past while, you need to walk a fine line anyway between micro-architecture and codegen. firstly, if you can gain benefit from TLH batch clearing which is default on p at least, TLH-prefetch is an overlapping action by and large. secondly, prefetch instructions with temporal characteristics were the ones having performance benefits from past experiences (since hardware prefetch does pretty much the same thing as explicit vanilla prefetch instructions and without consuming instruction cache capacity).

knn-k commented 9 months ago

@zl-wang I can change PR #18918 not to disable J9VM_GC_TLH_PREFETCH_FTA in linux_ppc-64_le.cmake. Is that what you mean, as the first step?

dmitripivkine commented 9 months ago

if you can gain benefit from TLH batch clearing which is default on p at least, TLH-prefetch is an overlapping action by and large

Good point. As I understand we are about to enable TLH batch clearing on X and ARM. So, relation between TLH batch clearing and TLH-prefetch should be considered.

knn-k commented 9 months ago

I updated PR #18918. It does not change J9VM_GC_TLH_PREFETCH_FTA in linux_ppc-64_le.cmake and in linux_390-64.cmake now.

knn-k commented 9 months ago

See J9VM_GC_TLH_PREFETCH_FTA settings after the change. The flag is turned off only on aix and zos.

Platform	J9VM_GC_TLH_PREFETCH_FTA
x86 and x86-64 platforms	ON (common)
aarch64 platforms	ON (common)
aix_ppc-64	OFF
linux_ppc-64_le	ON (common)
zos_390-64	OFF
linux_390-64	ON (common)

knn-k commented 9 months ago

As I understand we are about to enable TLH batch clearing on X and ARM.

TLH batch clearing is enabled by default on AArch64. common.cmake enables J9VM_GC_BATCH_CLEAR_TLH on all the platforms. https://github.com/eclipse-openj9/openj9/blob/a3486691d74018b28f9631a97ab08401d56d3025/runtime/cmake/caches/common.cmake#L121

On x86 platforms, you need to set the env variable TR_EnableBatchClear to allocate zeroed TLH pages. Other platforms have the env variable TR_DisableBatchClear. https://github.com/eclipse-openj9/openj9/blob/a3486691d74018b28f9631a97ab08401d56d3025/runtime/compiler/control/DLLMain.cpp#L359-L360

dmitripivkine commented 9 months ago

As I understand we are about to enable TLH batch clearing on X and ARM.

TLH batch clearing is enabled by default on AArch64. common.cmake enables J9VM_GC_BATCH_CLEAR_TLH on all the platforms.

https://github.com/eclipse-openj9/openj9/blob/a3486691d74018b28f9631a97ab08401d56d3025/runtime/cmake/caches/common.cmake#L121

On x86 platforms, you need to set the env variable TR_EnableBatchClear to allocate zeroed TLH pages. Other platforms have the env variable TR_DisableBatchClear.

https://github.com/eclipse-openj9/openj9/blob/a3486691d74018b28f9631a97ab08401d56d3025/runtime/compiler/control/DLLMain.cpp#L359-L360

I believe Julian's point is: using TLH Batch clearing can make TLH Prefetching less effective significantly. Another words if we zero every TLH memory up front prefetching might be not necessary.

knn-k commented 8 months ago

My points are as follows:

Setting OMR_GC_TLH_PREFETCH_FTA in .cmake files is useless and confusing. J9VM_GC_TLH_PREFETCH_FTA overriddes it during the build. PR #18918 fixes it.
Power platforms have different J9VM_GC_TLH_PREFETCH_FTA settings between Linux (ON) and AIX (OFF). Is there any reason for that?
Z platforms have different J9VM_GC_TLH_PREFETCH_FTA settings between Linux (ON) and z/OS (OFF). Is there any reason for that?
gc_tlhPrefetchFTA is disabled in UMA builds for pLinux and zLinux. They are inconsistent with CMake builds for those platforms.

I would like someone to review #18918.

knn-k commented 8 months ago

Additional finding: AIX JIT accesses J9VMThread->tlhPrefetchFTA, but the field is not updated by the GC side because J9VM_GC_TLH_PREFETCH_FTA is disabled on AIX.

p codegen accessing tlhPrefetchFTA: (There are some other locations) https://github.com/eclipse-openj9/openj9/blob/1690101f1cbba4e8e24d77bfeae420b3991cdf76/runtime/compiler/p/codegen/J9TreeEvaluator.cpp#L5918-L5919

GC code updating tlhPrefetchFTA: (There are some other locations) https://github.com/eclipse-openj9/openj9/blob/1690101f1cbba4e8e24d77bfeae420b3991cdf76/runtime/gc_include/ObjectAllocationAPI.hpp#L172-L174

knn-k commented 8 months ago

PR #18918 has been merged. J9VM_GC_TLH_PREFETCH_FTA and OMR_GC_TLH_PREFETCH_FTA have consistent values on every platform now. It is up to Power and Z people to investigate the differences between AIX and pLinux, z/OS and zLinux, respectively.

eclipse-openj9 / openj9

Inconsistent cmake settings for TLH_PREFETCH_FTA #18870