eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 721 forks source link

Inconsistent cmake settings for TLH_PREFETCH_FTA #18870

Open knn-k opened 9 months ago

knn-k commented 9 months ago

There are flags OMR_GC_TLH_PREFETCH_FTA and J9VM_GC_TLH_PREFETCH_FTA. Settings of these flags in cmake files are complicated, and look inconsistent for some platforms.

runtime/cmake/caches/common.cmake enables J9VM_GC_TLH_PREFETCH_FTA. The file does not have an entry for OMR_GC_TLH_PREFETCH_FTA.

https://github.com/eclipse-openj9/openj9/blob/ac66d8cf7c584ff003bfbe8c44d1a79330c3036e/runtime/cmake/caches/common.cmake#L149

Platform OMR_GC_TLH_PREFETCH_FTA J9VM_GC_TLH_PREFETCH_FTA
linux_x86-64 ON (explicit) ON (common)
aix_ppc-64 (no setting) OFF (explicit)
linux_ppc-64_le OFF (explicit) ON (common)
zos_390-64 OFF (explicit) OFF (explicit)
linux_390-64 (no setting) ON (common)
linux_aarch64 (no setting) ON (common)
knn-k commented 9 months ago

It seems the OMR_GC_TLH_PREFETCH_FTA value in the generated omrcfg.h reflects the J9VM_GC_TLH_PREFETCH_FTA value, as far as I tried on AArch64 Linux turning the J9VM_ flag on and off.

What happens on linux_ppc-64_le, where J9VM_GC_TLH_PREFETCH_FTA is ON and OMR_GC_TLH_PREFETCH_FTA is OFF? Can anybody look at OMR_GC_TLH_PREFETCH_FTA in the omrcfg.h for linux_ppc-64_le, please?

knn-k commented 9 months ago

OMR_GC_TLH_PREFETCH_FTA is defined in the generated omrcfg.h when J9VM_GC_TLH_PREFETCH_FTA is ON and OMR_GC_TLH_PREFETCH_FTA is OFF in cmake files. J9VM_GC_TLH_PREFETCH_FTA setting seems to override OMR_GC_TLH_PREFETCH_FTA.

I think setting the OMR_GC_TLH_PREFETCH_FTA value in cmake files makes it confusing. How about removing OMR_GC_TLH_PREFETCH_FTA lines from cmake files?

knn-k commented 9 months ago

@zl-wang Do you have any thoughts on:

linux_ppc-64_le: https://github.com/eclipse-openj9/openj9/blob/edbf0e1a5bce0b062eba57a98c6fe3e438a49d5c/runtime/cmake/caches/linux_ppc-64_le.cmake#L42

aix_ppc-64: https://github.com/eclipse-openj9/openj9/blob/edbf0e1a5bce0b062eba57a98c6fe3e438a49d5c/runtime/cmake/caches/aix_ppc-64.cmake#L34

knn-k commented 9 months ago

AArch64 platforms (Linux and macOS): J9VM_GC_TLH_PREFETCH_FTA is enabled, but the JIT compiler does not use the tlhPrefetchFTA field in J9VMThread at all.

zl-wang commented 9 months ago

It looks to me that it is off in all cases: (my search results below) FILE: /team/zlwang/repositories/openj9/runtime/cmake/caches/aix_ppc-64.cmake

set(J9VM_GC_TLH_PREFETCH_FTA OFF CACHE BOOL "") FILE: /team/zlwang/repositories/openj9/runtime/cmake/caches/linux_ppc-64_le.cmake

set(OMR_GC_TLH_PREFETCH_FTA OFF CACHE BOOL "")

It is indeed confusing. Why it is even relevant to OMR for a prefetch scheme dependent on J9VMThread support. In my opinion, this should not be visible in OMR side.

dmitripivkine commented 9 months ago

It is indeed confusing. Why it is even relevant to OMR for a prefetch scheme dependent on J9VMThread support. In my opinion, this should not be visible in OMR side.

I think it was an attempt to provide definition in OMR because GC TLH code lives there (in combination with desire to keep OMR independent upstream project). So, either we need to keep OMR* and J9VM* settings synchronized or use compromise and check J9VM_GC_TLH_PREFETCH_FTA in OMR code directly. Quick search shows we do have J9 definitions usage in OMR code already.

knn-k commented 9 months ago

@zl-wang Note that the symbol in aixppc-64.cmake starts with `J9VM, and the one in linux_ppc-64_le.cmake starts withOMR_`. They are different symbols. Please also look at the definitions of those symbols in j9cfg.h and omrcfg.h that are generated during the build.

knn-k commented 9 months ago

The following files in buildspecs set the gc_tlhPrefetchFTA flag:

linux_ppc-64_le.spec and linux_390-64.spec don't set the flag, which are inconsistent with the setting in common.cmake.

knn-k commented 9 months ago

Looking back the histories of .cmake files:

Power

Z

knn-k commented 9 months ago

@joransiu FYI.

knn-k commented 9 months ago

I ran personal builds to see if OMR_GC_TLH_PREFETCH_FTA is defined or not on Power and Z. AIX, z/OS, and Linux have different values as shown below.

(internal job: view/OpenJ9%20-%20Personal/job/Pipeline-Build-Test-Personal/20537/)

zl-wang commented 9 months ago

we don't need to set the macro(s) consistently across platforms, but of course setting them consistently on each single platform is a good step to take.

In order to realize the performance benefit of TLH-prefetch which has been dropped out of favour gradually in the past while, you need to walk a fine line anyway between micro-architecture and codegen. firstly, if you can gain benefit from TLH batch clearing which is default on p at least, TLH-prefetch is an overlapping action by and large. secondly, prefetch instructions with temporal characteristics were the ones having performance benefits from past experiences (since hardware prefetch does pretty much the same thing as explicit vanilla prefetch instructions and without consuming instruction cache capacity).

knn-k commented 9 months ago

@zl-wang I can change PR #18918 not to disable J9VM_GC_TLH_PREFETCH_FTA in linux_ppc-64_le.cmake. Is that what you mean, as the first step?

dmitripivkine commented 9 months ago

if you can gain benefit from TLH batch clearing which is default on p at least, TLH-prefetch is an overlapping action by and large

Good point. As I understand we are about to enable TLH batch clearing on X and ARM. So, relation between TLH batch clearing and TLH-prefetch should be considered.

knn-k commented 9 months ago

I updated PR #18918. It does not change J9VM_GC_TLH_PREFETCH_FTA in linux_ppc-64_le.cmake and in linux_390-64.cmake now.

knn-k commented 9 months ago

See J9VM_GC_TLH_PREFETCH_FTA settings after the change. The flag is turned off only on aix and zos.

Platform J9VM_GC_TLH_PREFETCH_FTA
x86 and x86-64 platforms ON (common)
aarch64 platforms ON (common)
aix_ppc-64 OFF
linux_ppc-64_le ON (common)
zos_390-64 OFF
linux_390-64 ON (common)
knn-k commented 9 months ago

As I understand we are about to enable TLH batch clearing on X and ARM.

TLH batch clearing is enabled by default on AArch64. common.cmake enables J9VM_GC_BATCH_CLEAR_TLH on all the platforms. https://github.com/eclipse-openj9/openj9/blob/a3486691d74018b28f9631a97ab08401d56d3025/runtime/cmake/caches/common.cmake#L121

On x86 platforms, you need to set the env variable TR_EnableBatchClear to allocate zeroed TLH pages. Other platforms have the env variable TR_DisableBatchClear. https://github.com/eclipse-openj9/openj9/blob/a3486691d74018b28f9631a97ab08401d56d3025/runtime/compiler/control/DLLMain.cpp#L359-L360

dmitripivkine commented 9 months ago

As I understand we are about to enable TLH batch clearing on X and ARM.

TLH batch clearing is enabled by default on AArch64. common.cmake enables J9VM_GC_BATCH_CLEAR_TLH on all the platforms.

https://github.com/eclipse-openj9/openj9/blob/a3486691d74018b28f9631a97ab08401d56d3025/runtime/cmake/caches/common.cmake#L121

On x86 platforms, you need to set the env variable TR_EnableBatchClear to allocate zeroed TLH pages. Other platforms have the env variable TR_DisableBatchClear.

https://github.com/eclipse-openj9/openj9/blob/a3486691d74018b28f9631a97ab08401d56d3025/runtime/compiler/control/DLLMain.cpp#L359-L360

I believe Julian's point is: using TLH Batch clearing can make TLH Prefetching less effective significantly. Another words if we zero every TLH memory up front prefetching might be not necessary.

knn-k commented 8 months ago

My points are as follows:

I would like someone to review #18918.

knn-k commented 8 months ago

Additional finding: AIX JIT accesses J9VMThread->tlhPrefetchFTA, but the field is not updated by the GC side because J9VM_GC_TLH_PREFETCH_FTA is disabled on AIX.

p codegen accessing tlhPrefetchFTA: (There are some other locations) https://github.com/eclipse-openj9/openj9/blob/1690101f1cbba4e8e24d77bfeae420b3991cdf76/runtime/compiler/p/codegen/J9TreeEvaluator.cpp#L5918-L5919

GC code updating tlhPrefetchFTA: (There are some other locations) https://github.com/eclipse-openj9/openj9/blob/1690101f1cbba4e8e24d77bfeae420b3991cdf76/runtime/gc_include/ObjectAllocationAPI.hpp#L172-L174

knn-k commented 8 months ago

PR #18918 has been merged. J9VM_GC_TLH_PREFETCH_FTA and OMR_GC_TLH_PREFETCH_FTA have consistent values on every platform now. It is up to Power and Z people to investigate the differences between AIX and pLinux, z/OS and zLinux, respectively.