eclipse / omr

Eclipse OMR™ Cross platform components for building reliable, high performance language runtimes

http://www.eclipse.org/omr

Other

940 stars 394 forks source link

Migrate processor and feature detection from OpenJ9 to OMR #4339

Open fjeremic opened 4 years ago

fjeremic commented 4 years ago

Background

The CPU class in the compiler in OMR [1] currently has no way of determining the processor we are running on at runtime. The OMR::CPU class is an extensible class which is meant to be overriden by the target code generator. Inside of these extended codegen classes we have APIs to determine whether we support generating instructions for a particular processor. For example [2] is the override on Z and [3] is the override on Power.

Each code generator seems to use different, although quite similar APIs to determine if we can generate instructions for a particular processor.

On Power we use TR::Compiler->target.cpu.id()
- Example in [4]
On Z we use TR::Compiler->target.cpu.getSupportsArch()
- Example in [5]
On x86 we use cg()->getX86ProcessorInfo()
- Example in [6]

As we can see we're somewhat inconsistent in the APIs we use. More over, if you notice the APIs we use across the different codegens will always default down to the minimum Architecture Level Set (ALS) defined in OMR. This means that irregardless of what processor we are actually running on we'll generate instructions for the least supported processor.

This is problematic in dynamic runtimes.

[1] https://github.com/eclipse/omr/blob/20db4fbc1ea570fd806783a97b16831f4f719eed/compiler/env/OMRCPU.cpp [2] https://github.com/eclipse/omr/blob/20db4fbc1ea570fd806783a97b16831f4f719eed/compiler/z/env/OMRCPU.cpp [3] https://github.com/eclipse/omr/blob/20db4fbc1ea570fd806783a97b16831f4f719eed/compiler/p/env/OMRCPU.cpp [4] https://github.com/eclipse/omr/blob/20db4fbc1ea570fd806783a97b16831f4f719eed/compiler/p/codegen/BinaryEvaluator.cpp#L2087-L2096 [5] https://github.com/eclipse/omr/blob/20db4fbc1ea570fd806783a97b16831f4f719eed/compiler/z/codegen/BinaryEvaluator.cpp#L3125-L3128 [6] https://github.com/eclipse/omr/blob/20db4fbc1ea570fd806783a97b16831f4f719eed/compiler/x/codegen/OMRCodeGenerator.cpp#L385-L388

Proposed Solution

Our friends at the OpenJ9 project have implemented processor detection in the port library already, and the JIT compiler in OpenJ9 is already making use of the data provided by the port library and caching the various answers for use in the JIT.

For example on Power and Z we use the j9sysinfo_processor_has_feature API [7] to query whether we have support for a particular CPU feature, which is initialized in the port library on initialization. Example of this is in the j9sysinfo.c file for both Power and Z [8].

Similarly processors are detected via the J9ProcessorDesc struct which is initialized via the j9sysinfo_get_processor_description API which can be found in the same j9sysinfo.c file. The way in which we initialize these structures depends on the OS and architecture type.

What we effectively need to do is to sink the OpenJ9 pieces of the processor and feature detection down into the OMR layer so that we can access it directly from the OMR compiler. The work here will involve investigating what pieces of the OpenJ9 port library are used and finding the correct place to move the APIs in OMR. Processor and feature detection is a delicate piece of OpenJ9 and is crucial for performance so such a change must be staged, that is to first copy the processor detection into OMR, make use of it in OMR, only once that is working we can work on deprecating the OpenJ9 piece to take advantage of the OMR support.

[7] https://github.com/eclipse/openj9/blob/82609efc1fa05a5d1af51ad6ceed7215376dde84/runtime/compiler/z/env/J9CPU.cpp#L230-L301 [8] https://github.com/eclipse/openj9/blob/82609efc1fa05a5d1af51ad6ceed7215376dde84/runtime/port/unix/j9sysinfo.c

fjeremic commented 4 years ago

@0xdaryl FYI. @AidanHa is this something you could tackle?

AidanHa commented 4 years ago

Sure, I can definitely look into this.

AidanHa commented 4 years ago

After reading through the issue and getting familiar with the code, I think I will divide this issue into 4 PRs. 1: Creation of processor and feature detection in OMR (from OpenJ9). 2: Implementation of the new feature throughout out OMR. 3: Implementation of the new feature throughout OpenJ9. 4: Deletion of the processor detection in OpenJ9 port library.

0xdaryl commented 4 years ago

For #2 and #3, by "implementation of the feature" do you mean "exploitation of the feature" ? (for lack of a better word).

AidanHa commented 4 years ago

@0xdaryl that is correct, I couldn't find a better way to phrase that point.

AidanHa commented 4 years ago

Update for those following: I got a working version of processor detection locally on my x86 machine. Will test other architectures, and submit a PR when all is ready!

AidanHa commented 4 years ago

I was wondering how should we be testing feature detection? In my opinion, Openj9 currently has a fairly "weak" test for feature detection (https://github.com/eclipse/openj9/blob/master/runtime/tests/port/si.c#L965-L971), and I would like to improve upon this if possible...

fjeremic commented 4 years ago

I was wondering how should we be testing feature detection? In my opinion, Openj9 currently has a fairly "weak" test for feature detection (eclipse/openj9:runtime/tests/port/si.c@master#L965-L971

On Linux we can likely query /proc/cpuinfo and parse the string and compare it against what we programmatically detect?

AidanHa commented 4 years ago

On Linux we can likely query /proc/cpuinfo and parse the string and compare it against what we programmatically detect?

This might have some issues where the user doesn't have permission to view the output of /proc/cpuinfo, and all that will show up is "permission denied".

IMO, it's very difficult to actually precisely verify if all the known features (for a specific processor) are set...

fjeremic commented 4 years ago

IMO, it's very difficult to actually precisely verify if all the known features (for a specific processor) are set...

I would agree. We'll just have to be very diligent when manually testing that this was done right. i.e. hop on a bunch of different machines and validate before/after. Perhaps using OpenJ9 may help here since processor detection is known to work there already.

AidanHa commented 4 years ago

I am currently on step 2 of resolving this issue: Implementing the port library function throughout OMR and unifying the different API calls for processor detection. One thing I noticed is that the current processor detection in compiler/env/OMRCPU.hpp (TR::Compiler>target.cpu.setProcessor() and TR::Compiler>target.cpu.id() is currently being used downstream in OpenJ9 (Eg: https://github.com/eclipse/openj9/blob/master/runtime/compiler/env/ProcessorDetection.cpp#L696), which makes it difficult to simply remove the current implementation and replace it with the port library version. To work my way around this issue, I've decided to go with the following:

1: In the CPU class in compiler/env/OMRCPU.hpp, add separate processor detection calls that utilize the new port library functions at https://github.com/eclipse/omr/pull/4503.

2: In OpenJ9, reconfigure all of the instances where the old processor detection functions were use to utilize the new port library functions.

3: Safely delete the old processor detection functions from OMR.

Let me know if this works!

AidanHa commented 4 years ago

For clarity: "old processor detection functions" = TR::Compiler>target.cpu.setProcessor(). "new processor detection functions" = the new port library functions.

fjeremic commented 4 years ago

Let me know if this works!

Sounds like a good plan.

harryyu1994 commented 4 years ago

Went through the above discussions:

Here's my understanding:

In OMR we have

env/OMRCPU
z/env/OMRCPU
p/env/OMRCPU
x/env/OMRCPU

the main issue here is that when you use the OMRCPU APIs, we get default values.
further more the APIs across platforms aren't consistent

In OpenJ9 we have

env/J9CPU
z/env/J9CPU
p/env/J9CPU
x/env/J9CPU

in addition to those we have compiler/env/ProcessorDetection.cpp
- which contains a collection of VM APIs that initializes the processor features depending on the platform during startup
- TR_J9VM::initializeProcessorType() as an example
we want to be able to initialize the processor features in OMR like we do in OpenJ9.
The processor detection APIs in ProcessorDetection.cpp relies on a bunch of openj9 port library functions.
So this means we need to first move the openj9 port library functions from OpenJ9 to OMR, which has been dealt with in #4503
next step to add processor detection in OMR
^ a question I have with that is in OpenJ9 we call initializeProcessorType() during the onLoadInternal() routine. Where should we do that in OMR?

Last Week's Meeting:

to get things going and as a first task, we want to migrate the class TR_X86ProcessorInfo which is in OMRCodeGenerator.hpp to x/OMRCPU.hpp in order to make the API calls for X somewhat consistent with Z and P. (to all use TR::Compiler->target.cpu, right now X's feature queries are in the codegen class instead of the cpu class).
and I would assume this is also a 3 step process:
- add new APIs in x/OMRCPU
- switch OMR to use the new APIs
- switch OpenJ9 to use the new APIs

@fjeremic @dsouzai Let me know if I understood everything correctly, I want to make sure everything is clear before I proceed to implementation. Thanks!

@mpirvu FYI, this is the processor feature detection work for OMR that I'll be working on for a while, I will likely log all my progress in this issue.

fjeremic commented 4 years ago

^ a question I have with that is in OpenJ9 we call initializeProcessorType() during the onLoadInternal() routine. Where should we do that in OMR?

The initialization should happen in the constructor of the OMRCPU class, i.e. calling the port library and caching the result if needed.

harryyu1994 commented 4 years ago

Had a high priority JITServer item last week, back to this work item now.

Some notes about `TR::Compiler->target.cpu`, how and where it's initialized

TR::Compiler is a global pointer of type TR::CompilerEnv

TR::CompilerEnv(from CompilerEnv.hpp) is a child of TR::OMR::CompilerEnv (from OMRCompilerEnv.hpp) through extensible classes

namespace OMR
{
class OMR_EXTENSIBLE CompilerEnv
{
public:
CompilerEnv(TR::RawAllocator raw, const TR::PersistentAllocatorKit &persistentAllocatorKit);
TR::CompilerEnv *self();
...
// Compilation target environment
//
TR::Environment target;

target is a TR::Environment object.

TR::Environment (from Environment.hpp) is a child of TR::OMR::Environment (from OMREnvironment.hpp)

namespace OMR
{
class Environment
{
public:
Environment() :
     _majorOS(TR::os_unknown),
     _bitness(TR::bits_unknown),
     _isSMP(false),
     _numberOfProcessors(1),
     cpu()
  {}
Environment(TR::MajorOperatingSystem o, TR::Bitness b) :
     _majorOS(o),
     _bitness(b),
     _isSMP(false),
     _numberOfProcessors(1),
     cpu()
  {}
TR::CPU cpu;

cpu is a TR::CPU object
TR::CPU is a child of TR::OMR::X86::CPU (from x/env/OMRCPU.hpp if we are on x) which is a child of TR::OMR::CPU (from env/OMRCPU.hpp) through extensible classes

in TR::OMR::CPU there is an API initializeByHostQuery()

so there could be two options for processor feature initialization, one is inside the TR::OMR::X86::CPU constructor and the other one is by overriding initializeByHostQuery() in TR::OMR::X86::CPU

namespace OMR
{
class CPU
{
protected:
CPU() :
 _processor(TR_NullProcessor),
 _endianness(TR::endian_unknown),
 _majorArch(TR::arch_unknown),
 _minorArch(TR::m_arch_none)
{}
public:
TR::CPU *self();
// Initialize CPU info by querying the host processor at compile-time
//
void initializeByHostQuery();

void
OMR::CPU::initializeByHostQuery()
   {

#ifdef OMR_ENV_LITTLE_ENDIAN
   _endianness = TR::endian_little;
#else
   _endianness = TR::endian_big;
#endif

   // Validate host endianness #define with an additional compile-time test
   //
   char SwapTest[2] = { 1, 0 };
   bool isHostLittleEndian = (*(short *)SwapTest == 1);

   if (_endianness == TR::endian_little)
      {
      TR_ASSERT(isHostLittleEndian, "expecting host to be little endian");
      }
   else
      {
      TR_ASSERT(!isHostLittleEndian, "expecting host to be big endian");
      }

#if defined(TR_HOST_X86)
   _majorArch = TR::arch_x86;
   #if defined (TR_HOST_64BIT)
   _minorArch = TR::m_arch_amd64;
   #else
   _minorArch = TR::m_arch_i386;
   #endif
#elif defined (TR_HOST_POWER)
   _majorArch = TR::arch_power;
#elif defined(TR_HOST_ARM)
   _majorArch = TR::arch_arm;
#elif defined(TR_HOST_S390)
   _majorArch = TR::arch_z;
#elif defined(TR_HOST_ARM64)
   _majorArch = TR::arch_arm64;
#elif defined(TR_HOST_RISCV)
   _majorArch = TR::arch_riscv;
   #if defined (TR_HOST_64BIT)
   _minorArch = TR::m_arch_riscv64;
   #else
   _minorArch = TR::m_arch_riscv32;
   #endif
#else
   TR_ASSERT(0, "unknown host architecture");
   _majorArch = TR::arch_unknown;
#endif

   }

initializeByHostQuery() is called in OMR::CompilerEnv::initializeTargetEnvironment()

void
OMR::CompilerEnv::initializeTargetEnvironment()
{

// Target processor bitness
//
#ifdef TR_TARGET_64BIT
target.setBitness(TR::bits_64);
#elif TR_TARGET_32BIT
target.setBitness(TR::bits_32);
#else
target.setBitness(TR::bits_unknown);
#endif

// Initialize the target CPU by querying the host processor
//
target.cpu.initializeByHostQuery();

// Target major operating system
//
#if HOST_OS == OMR_LINUX
target.setMajorOS(TR::os_linux);
#elif HOST_OS == OMR_AIX
target.setMajorOS(TR::os_aix);
#elif HOST_OS == OMR_WINDOWS
target.setMajorOS(TR::os_windows);
#elif HOST_OS == OMR_ZOS
target.setMajorOS(TR::os_zos);
#elif HOST_OS == OMR_OSX
target.setMajorOS(TR::os_osx);
#else
target.setMajorOS(TR::os_unknown);
#endif

}

fjeremic commented 4 years ago

Let's be consistent with other platforms. If an object can be fully initialized at the allocation point we should initialize the object in the Constructor and simplify the callers by not having to call an additional API which they may or may not be aware of.

harryyu1994 commented 4 years ago

Summary of the Processor Detection OMR Port Library Additions

port/common/omrport.c
- add omrsysinfo_get_processor_description() and omrsysinfo_processor_has_feature() into OMRPortLibrary MasterPortLibraryTable
include_core/omrport.h
- enum OMRProcessorArchitecture
- struct OMRProcessorDesc (comments still contain j9, needs to be corrected, j9sysinfo_get_processor_description, j9sysinfo_processor_has_feature)
- PowerPC feature macros
- s390 feature macros
- x86 feature macros
- add omrsysinfo_get_processor_description and omrsysinfo_processor_has_feature function pointers to struct OMRPortLibrary
- shorthanded omrsysinfo_get_processor_description and omrsysinfo_processor_has_feature macros
port/common/omrport.tdf
- Trc_PRT_sysinfo_get_processor_description_Entered, Group = j9sysinfo, this needs to be corrected (get rid of j9)
- Trc_PRT_sysinfo_get_processor_description_Exit, Group = j9sysinfo, this needs to be corrected (get rid of j9)
- Trc_PRT_sysinfo_processor_has_feature_Entered
- Trc_PRT_sysinfo_processor_has_feature_Exit
port/common/omrsysinfo.c and omrsysinfo.c
- omrsysinfo_get_processor_description()
- omrsysinfo_processor_has_feature()
port/common/omrsysinfo_helpers.c and omrsysinfo_helpers.h
- getX86Description()
- getX86CPUID()
port/linuxs390/omrsysinfo_helpers.c and omrsysinfo_helpers.h
- getstfle()
port/unix/omrsysinfo.c
- mapPPCProcessor()
- setFeature()
- getLinuxPPCDescription()
- getAIXPPCDescription()
- testSTEFLE()
- getS390Description()
- omrsysinfo_get_processor_descriptions()
- omrsysinfo_processor_has_feature()
port/unix_include/omrportpg.h
- OMRSTFLEFacilities
- OMRSTFLECache
- PPG_stfleCache
port/win32/omrsysinfo.c
- omrsysinfo_get_processor_description()
- omrsysinfo_processor_has_feature()
port/zos390/omrportpg.h
- OMRSTFLEFacilities
- OMRSTFLECache
- PPG_stfleCache

harryyu1994 commented 4 years ago

How OpenJ9 initializes "TR::Compiler->target.cpu" for x86

top level, we are using the vm/frontend API initializeProcessorType()

else if (TR::Compiler->target.cpu.isX86())
  {
  const char *vendor = TR::Compiler->target.cpu.getProcessorVendorId();
  uint32_t processorSignature = TR::Compiler->target.cpu.getProcessorSignature();

  TR::Compiler->target.cpu.setProcessor(portLibCall_getX86ProcessorType(vendor, processorSignature));

  TR_ASSERT(TR::Compiler->target.cpu.id() >= TR_FirstX86Processor
         && TR::Compiler->target.cpu.id() <= TR_LastX86Processor, "Not a valid X86 Processor Type");
  }

portLibCall_getX86ProcessorType() is not a very special call, just parses the vendor and processorSignature to get the processor type.
the more special call is getProcessorSignature(), which is able to get us feature flags on x86
it seems that, unlike Power and Z, X doesn't use j9sysinfo_get_processor_description() and j9sysinfo_processor_has_feature() to get its features, instead it uses queryX86TargetCPUID().
there are two versions of queryX86TargetCPUID(), one in x/env/OMRCPU.hpp (from omr) and one in x/env/J9CPU.hpp(from openj9). They are very similar in that they both call jitGetCPUID()

openj9's version

TR_X86CPUIDBuffer *
CPU::queryX86TargetCPUID()
{
static TR_X86CPUIDBuffer buf = { {'U','n','k','n','o','w','n','B','r','a','n','d'} };
jitGetCPUID(&buf);
return &buf;
}

omr's version

TR_X86CPUIDBuffer *
OMR::X86::CPU::queryX86TargetCPUID()
{
static TR_X86CPUIDBuffer *buf = NULL;
auto jitConfig = TR::JitConfig::instance();

if (jitConfig && jitConfig->getProcessorInfo() == NULL)
  {
  buf = (TR_X86CPUIDBuffer *) malloc(sizeof(TR_X86CPUIDBuffer));
  if (!buf)
     return 0;
  jitGetCPUID(buf);
  jitConfig->setProcessorInfo(buf);
  }
else
  {
  if (!buf)
     {
     if (jitConfig && jitConfig->getProcessorInfo())
        {
        buf = (TR_X86CPUIDBuffer *)jitConfig->getProcessorInfo();
        }
     else
        {
        buf = (TR_X86CPUIDBuffer *) malloc(sizeof(TR_X86CPUIDBuffer));

        if (!buf)
           memcpy(buf->_vendorId, "Poughkeepsie", 12); // 12 character dummy string (NIL term not used)

        buf->_processorSignature = 0;
        buf->_brandIdEtc = 0;
        buf->_featureFlags = 0x00000000;
        buf->_cacheDescription.l1instr = 0;
        buf->_cacheDescription.l1data  = 0;
        buf->_cacheDescription.l2      = 0;
        buf->_cacheDescription.l3      = 0;
        }
     }
  }

return buf;
}

the omr version seems unnecessarily complicated for such a straightforward task
it looks like it's trying to cache the results in jitConfig, but if we cache the result in the CPU class, then it's probably not necessary.
also this query means that if we don't have a jitConfig then we can't get the results, which doesn't make much sense either
also getProcessorInfo() is not used anywhere else except in this function
can we replace this with the OpenJ9 version? @fjeremic
another question, which is about the extensible classes: OMR::CPU --> OMR::X86::CPU --> J9::CPU --> J9::X86::CPU --> TR::CPU The TR::CompilerEnv pointer will be of TR::CPU type
```
TR::CPU *
OMR::CPU::self()
{
return static_cast<TR::CPU*>(this);
}
```
when we do self()->queryX86TargetCPUID(), which one are we actually invoking? the one closer to TR::CPU?

harryyu1994 commented 4 years ago

when we do self()->queryX86TargetCPUID(), which one are we actually invoking? the one closer to TR::CPU?

This question I can answer it myself. self() returns a pointer to the most derived/specialized class, which means we will always invoke the functions "closer" to the most specialized class. Dynamic polymorphism: we have a base pointer, then rely on the virtual keyword to find the most specialized function. Static polymorphism (what's used here): we will always get the most specialized pointer through self()

fjeremic commented 4 years ago

can we replace this with the OpenJ9 version? @fjeremic

Yes we can. Ideally as you mentioned earlier where we want to arrive at the end is for this processor detection to be migrated into the port library (sysinfo_get_processor_description) and x86 would query that, similarly to other platforms.

harryyu1994 commented 4 years ago

`jitGetCPUID()` vs `omrsysinfo_get_x86_description()`

jitGetCPUID() returns TR_X86CPUIDBuffer

struct TR_X86CPUIDBuffer
{
char     _vendorId[12];          // not NULL-terminated
uint32_t _processorSignature;
uint32_t _brandIdEtc;
uint32_t _featureFlags;
uint32_t _featureFlags2;
struct
  {
  uint32_t l1instr;
  uint32_t l1data;
  uint32_t l2;
  uint32_t l3;
  } _cacheDescription;
uint32_t _featureFlags8;
};

omrsysinfo_get_x86_description() returns OMRProcessorDesc

#define OMRPORT_SYSINFO_FEATURES_SIZE 5
typedef struct OMRProcessorDesc {
OMRProcessorArchitecture processor;
OMRProcessorArchitecture physicalProcessor;
uint32_t features[OMRPORT_SYSINFO_FEATURES_SIZE];
} OMRProcessorDesc;

omrsysinfo_get_x86_description() only initializes 2 out of 3 x86 feature flags

/* features */
desc->features[0] = CPUInfo[3]; // equivalent to jitGetCPUID's  pBuffer->_featureFlags       = CPUInfo[EDX];
desc->features[1] = CPUInfo[2]; // equivalent to jitGetCPUID's pBuffer->_featureFlags2      = CPUInfo[ECX];
desc->features[2] = 0; /* reserved for future expansion */

jitGetCPUID() initializes all 3 of them

if (CPUInfo[EAX] > 0)
  {
  // EAX = 1
  cpuid(CPUInfo, 1);
  pBuffer->_processorSignature = CPUInfo[EAX];
  pBuffer->_brandIdEtc         = CPUInfo[EBX];
  pBuffer->_featureFlags       = CPUInfo[EDX];
  pBuffer->_featureFlags2      = CPUInfo[ECX];
  // EAX = 7, ECX = 0
  cpuidex(CPUInfo, 7, 0);
  pBuffer->_featureFlags8 = CPUInfo[EBX];

  // Check for XSAVE
  if(pBuffer->_featureFlags2 & TR_OSXSAVE)
     {
     if(((6 & _xgetbv(0)) != 6) || feGetEnv("TR_DisableAVX")) // '6' = mask for XCR0[2:1]='11b' (XMM state and YMM state are enabled)
        {
        // Unset OSXSAVE if not enabled via CR0
        pBuffer->_featureFlags2 &= ~TR_OSXSAVE;
        }
     }

we may need to add that code in omrsysinfo_get_x86_description(), it uses cpuidex to get _featureFlags8
jitGetCPUID()'s _cacheDescription and _brandIdEtc are not used anywhere in OpenJ9, so I'm not sure if we need to port them over to omrsysinfo_get_x86_description().

harryyu1994 commented 4 years ago

OMR's `cpuid` vs. OpenJ9's `cpuid` (x86)

Current OMR portLibrary's implementation of cpuid


/**
* Assembly code to get the register data from CPUID instruction
* This function executes the CPUID instruction based on which we can detect
* if the environment is virtualized or not, and also get the Hypervisor Vendor
* Name based on the same instruction. The leaf value specifies what information
* to return.
*
* @param[in]   leaf The leaf value to the CPUID instruction.
* @param[out]  cpuInfo     Reference to the an integer array which holds the data
*                          of EAX,EBX,ECX and EDX registers.
*              cpuInfo[0] To hold the EAX register data, value in this register at
*                          the time of CPUID tells what information to return
*                          EAX=0x1,returns the processor Info and feature bits
*                          in EBX,ECX,EDX registers.
*                          EAX=0x40000000 returns the Hypervisor Vendor Names
*                          in the EBX,ECX,EDX registers.
*              cpuInfo[1]  For EAX = 0x40000000 hold first 4 characters of the
*                          Hypervisor Vendor String
*              cpuInfo[2]  For EAX = 0x1, the 31st bit of ECX tells if its
*                          running on Hypervisor or not,For EAX = 0x40000000 holds the second
*                          4 characters of the the Hypervisor Vendor String
*              cpuInfo[3]  For EAX = 0x40000000 hold the last 4 characters of the
*                          Hypervisor Vendor String
*
*/
void
getX86CPUID(uint32_t leaf, uint32_t *cpuInfo)
{
cpuInfo[0] = leaf;

/ Implemented for x86 & x86_64 bit platforms /

if defined(WIN32)

/* Specific CPUID instruction available in Windows */
__cpuid(cpuInfo, cpuInfo[0]);

elif defined(LINUX) || defined(OSX)

if defined(J9X86)

__asm volatile
("mov %%ebx, %%edi;"
        "cpuid;"
        "mov %%ebx, %%esi;"
        "mov %%edi, %%ebx;"
        :"+a" (cpuInfo[0]), "=S" (cpuInfo[1]), "=c" (cpuInfo[2]), "=d" (cpuInfo[3])
         : :"edi");

elif defined(J9HAMMER)

__asm volatile( "cpuid;" :"+a" (cpuInfo[0]), "=b" (cpuInfo[1]), "=c" (cpuInfo[2]), "=d" (cpuInfo[3]) );

endif

}

- Though I think what's actually being used at the moment on x86 is this:
```cpp
#if defined(OMR_OS_WINDOWS)
#include <intrin.h>
#define cpuid(CPUInfo, EAXValue)             __cpuid(CPUInfo, EAXValue)
#define cpuidex(CPUInfo, EAXValue, ECXValue) __cpuidex(CPUInfo, EAXValue, ECXValue)
#else
#include <cpuid.h>
#include <emmintrin.h>
#define cpuid(CPUInfo, EAXValue)             __cpuid(EAXValue, CPUInfo[0], CPUInfo[1], CPUInfo[2], CPUInfo[3])
#define cpuidex(CPUInfo, EAXValue, ECXValue) __cpuid_count(EAXValue, ECXValue, CPUInfo[0], CPUInfo[1], CPUInfo[2], CPUInfo[3])

Which one do you think is preferred? Wouldn't the second one look better? @fjeremic
related changes in the past:
- https://github.com/eclipse/omr/pull/850

harryyu1994 commented 4 years ago

TR_Processor vs. TR_ProcessorDescription (x86)

I'm looking to eliminate TR_ProcessorDescription, which is used by OpenJ9 x86 codegen and serves the same purpose as TR_Processor. However, we do need to check for differences.

TR_ProcessorDescription appears to have a few more types compared to TR_Processor

enum TR_Processor
{

TR_NullProcessor,
TR_FirstProcessor,
// ... I have taken out the other platforms

// X86 Processors
TR_FirstX86Processor,
TR_DefaultX86Processor = TR_FirstX86Processor,
TR_X86ProcessorIntelPentium,
TR_X86ProcessorIntelP6,
TR_X86ProcessorIntelPentium4,
TR_X86ProcessorIntelCore2,
TR_X86ProcessorIntelTulsa,
TR_X86ProcessorAMDK5,
TR_X86ProcessorAMDK6,
TR_X86ProcessorAMDAthlonDuron,
TR_LastX86Processor,
TR_X86ProcessorAMDOpteron = TR_LastX86Processor,

TR_LastProcessor = TR_LastX86Processor,

TR_NumberOfProcessors
};

enum TR_ProcessorDescription
{
TR_ProcessorUnknown          = 0x00000000,
TR_ProcessorIntelPentium     = 0x00000001,
TR_ProcessorIntelP6          = 0x00000002,
TR_ProcessorIntelPentium4    = 0x00000003,

TR_ProcessorAMDK5            = 0x00000004,
TR_ProcessorAMDK6            = 0x00000005,
TR_ProcessorAMDAthlonDuron   = 0x00000006,
TR_ProcessorAMDOpteron       = 0x00000007,

TR_ProcessorIntelCore2       = 0x00000008,
TR_ProcessorIntelTulsa       = 0x00000009,
TR_ProcessorIntelNehalem     = 0x0000000a,

TR_ProcessorAMDFamily15h     = 0x0000000b,
TR_ProcessorIntelWestmere    = 0x0000000c,
TR_ProcessorIntelSandyBridge = 0x0000000d,
TR_ProcessorIntelIvyBridge   = 0x0000000e,
TR_ProcessorIntelHaswell     = 0x0000000f,
TR_ProcessorIntelBroadwell   = 0x00000010,
TR_ProcessorIntelSkylake     = 0x00000011,
};

we may need to port over the missing types from TR_ProcessorDescription to TR_Processor

TR_Processor
mapOMRProcessor(OMRProcessorArchitecture processor)
{
 case PROCESSOR_X86_UNKNOWN:
     tp = TR_DefaultX86Processor;
     break;
  case PROCESSOR_X86_INTELPENTIUM:
     tp = TR_X86ProcessorIntelPentium;
     break;
  case PROCESSOR_X86_INTELP6:
     tp = TR_X86ProcessorIntelP6;
     break;
  case PROCESSOR_X86_INTELPENTIUM4:
     tp = TR_X86ProcessorIntelPentium4;
     break;
  case PROCESSOR_X86_INTELCORE2:
     tp = TR_X86ProcessorIntelCore2;
     break;
  case PROCESSOR_X86_INTELTULSA:
     tp = TR_X86ProcessorIntelTulsa;
     break;
  case PROCESSOR_X86_AMDK5:
     tp = TR_X86ProcessorAMDK5;
     break;
  case PROCESSOR_X86_AMDK6:
     tp = TR_X86ProcessorAMDK6;
     break;
  case PROCESSOR_X86_AMDATHLONDURON:
     tp = TR_X86ProcessorAMDAthlonDuron;
     break;
  case PROCESSOR_X86_AMDOPTERON:
     tp = TR_X86ProcessorAMDOpteron;
     break;
  case PROCESSOR_X86_INTELNEHALEM:             //Not yet mapped in TR_Processor.
  case PROCESSOR_X86_INTELWESTMERE:
  case PROCESSOR_X86_INTELSANDYBRIDGE:
  case PROCESSOR_X86_INTELHASWELL:
     tp = TR_X86ProcessorAMDOpteron;
     break;
  default:
    return tp;
  }

OMRCodeGenerator with `TR_ProcessorDescription`

uint32_t _processorSignature = cg->comp()->target().cpu.getX86ProcessorSignature();

   if (isGenuineIntel())
      {
      switch (getCPUFamily(_processorSignature))
         {
         case 0x05: _processorDescription |= TR_ProcessorIntelPentium; break;
         case 0x06:
            {
            uint32_t extended_model = getCPUModel(_processorSignature) + (getCPUExtendedModel(_processorSignature) << 4);
            switch (extended_model)
               {
               case 0x55:
                  _processorDescription |= TR_ProcessorIntelSkylake; break;
               case 0x4f:
                  _processorDescription |= TR_ProcessorIntelBroadwell; break;
               case 0x3f:
               case 0x3c:
                  _processorDescription |= TR_ProcessorIntelHaswell; break;
               case 0x3e:
               case 0x3a:
                  _processorDescription |= TR_ProcessorIntelIvyBridge; break;
               case 0x2a:
               case 0x2d:  // SandyBridge EP
                  _processorDescription |= TR_ProcessorIntelSandyBridge; break;
               case 0x2c:  // WestmereEP
               case 0x2f:  // WestmereEX
                  _processorDescription |= TR_ProcessorIntelWestmere; break;
               case 0x1a:  // Nehalem
                  _processorDescription |= TR_ProcessorIntelNehalem; break;
               case 0x17:  // Harpertown
               case 0x0f:  // Woodcrest/Clovertown
                  _processorDescription |= TR_ProcessorIntelCore2; break;
               default:  _processorDescription |= TR_ProcessorIntelP6; break;
               }
            break;
            }
         case 0x0f: _processorDescription |= TR_ProcessorIntelPentium4; break;
         default:   _processorDescription |= TR_ProcessorUnknown; break;
         }
      }
   else if (isAuthenticAMD())
      {
      switch (getCPUFamily(_processorSignature))
         {
         case 0x05:
            if (getCPUModel(_processorSignature) < 0x04)
               _processorDescription |= TR_ProcessorAMDK5;
            else
               _processorDescription |= TR_ProcessorAMDK6;
            break;
         case 0x06: _processorDescription |= TR_ProcessorAMDAthlonDuron; break;
         case 0x0f:
            if (getCPUExtendedFamily(_processorSignature) < 6)
               _processorDescription |= TR_ProcessorAMDOpteron;
            else
               _processorDescription |= TR_ProcessorAMDFamily15h;
            break;
         default:   _processorDescription |= TR_ProcessorUnknown; break;
         }
      }

omrsysinfo_get_processor_description

intptr_t
omrsysinfo_get_x86_description(struct OMRPortLibrary *portLibrary, OMRProcessorDesc *desc)
{
    uint32_t CPUInfo[4] = {0};
    char vendor[12];
    uint32_t familyCode = 0;
    uint32_t processorSignature = 0;

    desc->processor = OMR_PROCESSOR_X86_UNKNOWN;

    /* vendor */
    omrsysinfo_get_x86_cpuid(CPUID_VENDOR_INFO, CPUInfo);
    memcpy(vendor + 0, &CPUInfo[1], sizeof(uint32_t));
    memcpy(vendor + 4, &CPUInfo[3], sizeof(uint32_t));
    memcpy(vendor + 8, &CPUInfo[2], sizeof(uint32_t));

    /* family and model */
    omrsysinfo_get_x86_cpuid(CPUID_FAMILY_INFO, CPUInfo);
    processorSignature = CPUInfo[0];
    familyCode = (processorSignature & CPUID_SIGNATURE_FAMILY) >> CPUID_SIGNATURE_FAMILY_SHIFT;
    if (0 == strncmp(vendor, CPUID_VENDOR_INTEL, CPUID_VENDOR_LENGTH)) {
        switch (familyCode) {
        case CPUID_FAMILYCODE_INTELPENTIUM:
            desc->processor = OMR_PROCESSOR_X86_INTELPENTIUM;
            break;
        case CPUID_FAMILYCODE_INTELCORE:
        {
            uint32_t modelCode  = (processorSignature & CPUID_SIGNATURE_MODEL) >> CPUID_SIGNATURE_MODEL_SHIFT;
            uint32_t extendedModelCode = (processorSignature & CPUID_SIGNATURE_EXTENDEDMODEL) >> CPUID_SIGNATURE_EXTENDEDMODEL_SHIFT;
            uint32_t totalModelCode = modelCode + extendedModelCode;

            if (totalModelCode > CPUID_MODELCODE_INTELHASWELL) {
                desc->processor = OMR_PROCESSOR_X86_INTELHASWELL;
            } else if (totalModelCode >= CPUID_MODELCODE_SANDYBRIDGE) {
                desc->processor = OMR_PROCESSOR_X86_INTELSANDYBRIDGE;
            } else if (totalModelCode >= CPUID_MODELCODE_INTELWESTMERE) {
                desc->processor = OMR_PROCESSOR_X86_INTELWESTMERE;
            } else if (totalModelCode >= CPUID_MODELCODE_INTELNEHALEM) {
                desc->processor = OMR_PROCESSOR_X86_INTELNEHALEM;
            } else if (totalModelCode == CPUID_MODELCODE_INTELCORE2) {
                desc->processor = OMR_PROCESSOR_X86_INTELCORE2;
            } else {
                desc->processor = OMR_PROCESSOR_X86_INTELP6;
            }
            break;
        }
        case CPUID_FAMILYCODE_INTELPENTIUM4:
            desc->processor = OMR_PROCESSOR_X86_INTELPENTIUM4;
            break;
        }
    } else if (0 == strncmp(vendor, CPUID_VENDOR_AMD, CPUID_VENDOR_LENGTH)) {
        switch (familyCode) {
        case CPUID_FAMILYCODE_AMDKSERIES:
        {
            uint32_t modelCode  = (processorSignature & CPUID_SIGNATURE_FAMILY) >> CPUID_SIGNATURE_MODEL_SHIFT;
            if (modelCode < CPUID_MODELCODE_AMDK5) {
                desc->processor = OMR_PROCESSOR_X86_AMDK5;
            }
            desc->processor = OMR_PROCESSOR_X86_AMDK6;
            break;
        }
        case CPUID_FAMILYCODE_AMDATHLON:
            desc->processor = OMR_PROCESSOR_X86_AMDATHLONDURON;
            break;
        case CPUID_FAMILYCODE_AMDOPTERON:
            desc->processor = OMR_PROCESSOR_X86_AMDOPTERON;
            break;
        }
    }

    desc->physicalProcessor = desc->processor;

    /* features */
    desc->features[0] = CPUInfo[3];
    desc->features[1] = CPUInfo[2];
    desc->features[2] = 0; /* reserved for future expansion */

    return 0;
}

harryyu1994 commented 4 years ago

The above 3 posts suggest that omrsysinfo_get_x86_description() is lacking:

featureFlag8 (cpuidex)
a few (I assume newer) processor types

I'm going to open a separate PR to handle this task.

harryyu1994 commented 4 years ago

@fjeremic Do you think it's better for x86 to go with the Power/Z way for feature query and use j9sysinfo_processor_has_feature() (with J9ProcessorDesc cached) or keep the original way? so in the latter case we would get the feature flags from j9sysinfo_get_processor_description and cache them in flags32_t and query using testAny(). The original way looks nicer, but we would have redundant things such as enum TR_X86ProcessorFeatures, which are already defined by the port library. The Power/Z way looks like old c code but it gets the job done.

enum TR_X86ProcessorFeatures
   {
   TR_BuiltInFPU                    = 0x00000001,
   TR_VirtualModeExtension          = 0x00000002,
   TR_DebuggingExtension            = 0x00000004,
   TR_PageSizeExtension             = 0x00000008,
   TR_RDTSCInstruction              = 0x00000010,
   TR_ModelSpecificRegisters        = 0x00000020,
   TR_PhysicalAddressExtension      = 0x00000040,
   TR_MachineCheckException         = 0x00000080,
   TR_CMPXCHG8BInstruction          = 0x00000100,
   TR_APICHardware                  = 0x00000200,
   // Reserved by Intel             = 0x00000400,
   TR_FastSystemCalls               = 0x00000800,
   TR_MemoryTypeRangeRegisters      = 0x00001000,
   TR_PageGlobalFlag                = 0x00002000,
   TR_MachineCheckArchitecture      = 0x00004000,
   TR_CMOVInstructions              = 0x00008000,
   TR_PageAttributeTable            = 0x00010000,
   TR_36BitPageSizeExtension        = 0x00020000,
   TR_ProcessorSerialNumber         = 0x00040000, // GenuineIntel only
   TR_CLFLUSHInstruction            = 0x00080000, // GenuineIntel only
   // Reserved by Intel             = 0x00100000,
   TR_DebugTraceStore               = 0x00200000, // GenuineIntel only
   TR_ACPIRegisters                 = 0x00400000, // GenuineIntel only
   TR_MMXInstructions               = 0x00800000,
   TR_FastFPSavesRestores           = 0x01000000,
   TR_SSE                           = 0x02000000,
   TR_SSE2                          = 0x04000000, // GenuineIntel Only
   TR_SelfSnoop                     = 0x08000000, // GenuineIntel Only
   TR_HyperThreading                = 0x10000000, // GenuineIntel Only
   TR_ThermalMonitor                = 0x20000000, // GenuineIntel Only
   // Reserved by Intel             = 0x40000000, // IA64-specific
   // Reserved by Intel             = 0x80000000
   TR_X86ProcessorInfoInitialized   = 0x80000000  // FIXME: Using a reserved bit for our purposes.
   };

How power handles feature query

bool
CPU::getPPCSupportsVMX()
{
#if defined(J9OS_I5) && defined(J9OS_I5_V5R4)
return FALSE;
#else
J9ProcessorDesc   *processorDesc       = TR::Compiler->target.cpu.TO_PORTLIB_getJ9ProcessorDesc();
J9PortLibrary     *privatePortLibrary  = TR::Compiler->portLib;
BOOLEAN feature = j9sysinfo_processor_has_feature(processorDesc, J9PORT_PPC_FEATURE_HAS_ALTIVEC);
return (TRUE == feature);
#endif
}

How x86 handles feature query

bool hasBuiltInFPU()                    {return testFeatureFlags(TR_BuiltInFPU);}
bool testFlag(flags32_t &flag, uint32_t feature, uint32_t mask)
  {
  TR_ASSERT_FATAL(feature & mask, "The %x feature needs to be added to the "
                                  "getFeatureFlagsMask (or variant) function "
                                  "for correctness in relocatable compiles!\n",
                  feature);

  return flag.testAny(feature);
  }
/**
* @brief testFeatureFlags Wrapper around testFlag
*
* @param feature          The feature being tested for
*
* @return                 The result of testFlag
*/
bool testFeatureFlags(uint32_t feature)
  {
  return testFlag(_featureFlags, feature, getFeatureFlagsMask());
  }

fjeremic commented 4 years ago

@harryyu1994 ideally the feature flags (which are enum values on x86) should live in the port library. There is no need for us to be duplicating them in the compiler. As such it would be preferable to do it the Power way, i.e. use the J9PORT_*_FEATURE_* flag which is defined in the port library to query the feature in the compiler. This removes any indirection and all duplication and there is a single point of definition for a particular processor feature, which lives in the port library which is accessible by all OMR components.

harryyu1994 commented 4 years ago

Looked into Z/CPU: Unlike X, to determine whether a feature is enabled, it needs two pieces of information (arch and feature), as an example:

// z9 supports DFP in millicode so do not check for DFP support unless we are z10+
   if (TR::Compiler->target.cpu.getSupportsArch(TR::CPU::z10) &&
         j9sysinfo_processor_has_feature(processorDesc, J9PORT_S390_FEATURE_DFP))
      {
      TR::Compiler->target.cpu.setSupportsDecimalFloatingPointFacility(true);
      }

This creates some compilations if we want to use just the singular cpu.supportsFeature(YOUR_FEATURE) API for all the Z feature queries.
I see 3 options
- 1. keep the hardcode getters (i.e. getSupportsDecimalFloatingPointFacility()), IMO the worst option out of the 3
- 1. check arch level in port library and set the feature bits accordingly there
- 1. modify the OMRProcessorDesc feature fields right after it has been initialized for Z (disable things according to the arch level)

Served as a reference, the following is the wrapper to port library call omrsysinfo_get_processor_description()

OMRProcessorDesc*
CPU::TO_PORTLIB_getOMRProcessorDesc()
{
static int processorTypeInitialized = 0;
static OMRProcessorDesc processorDesc;

if(processorTypeInitialized == 0)
  {
  PORT_ACCESS_FROM_PORT(TR::Compiler->portLib);

  //Just make sure we initialize the OMRProcessorDesc at least once.
  omrsysinfo_get_processor_description(&processorDesc);

  processorTypeInitialized = 1;
  }
return &processorDesc;
}

@fjeremic Again, need your input on this

fjeremic commented 4 years ago

@fjeremic Again, need your input on this

Option c. seems to be the most reasonable thing to do. This way the compiler doesn't muck with the port library true source of information.

harryyu1994 commented 4 years ago

The End Goal

In my opinion, these two generic APIs will probably be enough to handle most of the processor detection scenarios. The supportsProcessor() is needed for Power and Z as a lot of their queries need both processorType and feature.
```
bool supportsFeature(uint32_t feature);
bool supportsProcessor(OMRProcessorArchitecture processor);
```

Clean Up on Z

I'm creating a separate PR to clean up Z now

I noticed that Z::CPU has its own enum for processor type.

enum Architecture
  {
  Unknown = 0,
  z900,
  z990,
  z9,
  z10,
  z196,
  zEC12,
  z13,
  z14,
  z15,
  zNext,
  };

this information is quite important and is used throughout the OpenJ9 codebase.
in order to achieve the generic API supportsProcessor() on Z, I would like to get rid of its private enum Architecture and use Port Library's processor architecture instead

Here's the mapping (Port Library Arch vs. Z::CPU::Architecture) that I've determined:

typedef enum OMRProcessorArchitecture {

OMR_PROCESSOR_UNDEFINED,

OMR_PROCESSOR_S390_UNKNOWN,
OMR_PROCESSOR_S390_GP6 --> TO_PORTLIB_get390_supportsZ6() --> TR::CPU::z10
OMR_PROCESSOR_S390_GP7,
OMR_PROCESSOR_S390_GP8,
OMR_PROCESSOR_S390_GP9 --> TO_PORTLIB_get390_supportsZGryphon() --> TR::CPU::z196
OMR_PROCESSOR_S390_GP10 --> TO_PORTLIB_get390_supportsZHelix() --> TR::CPU::zEC12
OMR_PROCESSOR_S390_GP11 --> TO_PORTLIB_get390_supportsZ13() --> TR::CPU::z13
OMR_PROCESSOR_S390_GP12 --> TO_PORTLIB_get390_supportsZ14() --> TR::CPU::z14
OMR_PROCESSOR_S390_GP13 --> TO_PORTLIB_get390_supportsZ15() --> TR::CPU::z15
OMR_PROCESSOR_S390_GP14 --> TO_PORTLIB_get390_supportsZNext() --> TR::CPU::zNext

They look pretty 1-to-1 to me.
The Z::CPU::Architecture names are used much more frequently than the port library arch names.
We have two options here:
- 1. Replace the port library name with the Z::CPU::Architecture names
- 1. Keep both set of names in the port library.

example of option b

OMR_PROCESSOR_S390_GP9,
OMR_PROCESSOR_S390_Z196 = OMR_PROCESSOR_S390_GP9,

@fjeremic What do you think?

harryyu1994 commented 4 years ago

I'm going to keep both set of names for now.

fjeremic commented 4 years ago

I'm going to keep both set of names for now.

Keep both sets. We should aim to phase out the OMR_PROCESSOR_S390_GP* enums eventually. They are not very descriptive.

harryyu1994 commented 4 years ago

I realized there could still be a bit of work after the port library initialization decision, I'm going to not wait for that decision and try to finish off this whole issue by initializing my own omr port library for now. Just realized it doesn't look that hard to get the omr port library when we have OpenJ9 and its port library in hand.

harryyu1994 commented 4 years ago

New API against Old API tests in OpenJ9: https://github.com/eclipse/openj9/pull/9295

harryyu1994 commented 3 years ago

Latest Changes:

Summary of changes:

Retire old version cpu detection code on Power
Move CPU::getProcessorDescription() to base J9CPU class as the 3 platforms (X, Z, Power) share the same code. There is no need to have 3 copies
Reasons for the removal:
- We kept the old version because we wanted to make sure new version and old version give us the same behaviour
- Now it's been a few month and we don't see any bugs with the new version there's no reason to keep the old version
- Also we want to prevent developers from using the old version.

Old design

Initializes the TR_Processor struct via j9port library
- Initialization occurs in TR_J9VM::initializeProcessorType()
- obtain cpu information using CPU::TO_PORTLIB_getJ9ProcessorDesc()
- TR::Compiler->target.cpu.setProcessor to set processor
Fields and methods associated with the old design live in Class CPU
Common APIs:
- CPU::TO_PORTLIB_getJ9ProcessorDesc()
- j9sysinfo_processor_has_feature(processorDesc, feature) (j9port library API)
- CPU::id()
- CPU::setProcessor(TR_Processor)

New design

Initializes the OMRProcessorDesc struct via omrport library
- CPU::detect() to initialize OMRProcessorDesc processorDescription
- CPU::applyUserOptions() to apply any debug options after initialization
- TR::Compiler->target.cpu singleton variable that holds the CPU instance
- comp()->target().cpu per compilation variable which is used for cross-cpu compilation support
- For JIT we initialize comp()->target().cpu with TR::Compiler->target.cpu (Host CPU)
- For AOT we initialize comp()->target().cpu with TR::Compiler->relocatableTarget.cpu (Portable CPU)
Fields and methods associated with the new design live in Class CPU
Common APIs:
- CPU::supportsFeature(feature)
- CPU::isAtLeast(OMRProcessorArchitecture)
Benefits:
- Common to all 3 platforms
- Simpler interface
- Cross-CPU compilation support (Use one cpu for AOT and another for JIT)

Introduce Feature Masks

Guard cpu feature tests against old version cpu feature detection on x86 with an environment variable.
Remove redundant assertion tests on x86.
Eliminate applyUserOptions().
- Disassociate the options from TR::CPU.
- Issue with applyUserOptions() is that the options are not initialized when TR::CPU is initialized which means if the user calls applyUserOptions() at the incorrect place the options queries that it contains will either crash the program or not take effect. I want to eliminate this risk of incorrect usage.
Introduce new API called TR::CPU::customize(OMRProcessorDesc processorDescription)
- it's a static factory method that takes in an OMRProcessorDesc from user and output a TR::CPU object
- The design is that we do not give user to ability to modify the fields of TR::CPU object in place. You will need to go through a factory method to change TR::CPU. This makes it easier for us to keep track where TR::CPU is modified and eliminate any unintentional illegal TR::CPU modifications from user which can be disastrous.
Introduce new API called TR::CPU::enableFeatureMasks()
- You can turn on "featureMasks" support using this API
- Once "featureMasks" support is turned on, we will mask out all cpu features that are not used by the current compiler from the OMRProcessorDesc struct in TR::CPU which allows to do CPU compatibility checks for AOT more correctly
- For each platform, we keep track of the utilized cpu features inside TR::CPU::enableFeatureMasks()
- If developer utilizes a previously unused feature, they need to add it to TR::CPU::enableFeatureMasks()
- We have assert tests in place to detect queries to unused feature
- The purpose of this detection is to minimize the risk of masking out cpu features that are being used.
Summary of current design:
- TR::CPU::detect(omrPortLib): static factory method that gives you the TR::CPU object based on port library
- TR::CPU::customize(processorDescription): static factory method that gives you the TR::CPU object based on a processorDescription provided by user

0xdaryl commented 3 years ago

@fjeremic @dsouzai : There are a lot of detailed comments in this issue tracking the progress of this work (thanks @harryyu1994!). However, can one of you please provide a summary of what work remains to be done from its current state?

@nbhuiyan will return to the port library initialization problem soon (he's working on a PR) which will make the port lib accessible within OMR and should deprecate some of the duplicate code within OMR itself.

fjeremic commented 3 years ago

AFAIK the detection has been sunk into OMR and there are two main pieces still remaining to finish:

Hookup the OMR port library to the detection

For example we have the following code: https://github.com/eclipse/omr/blob/3d85fbdf9c73e46c6da5df8989c119164711b368/compiler/env/OMRCompilerEnv.cpp#L77-L79

Which detects the target CPU using the OMR port library. Unfortunately because the port library has not been hooked up to the compiler in OMR this value is NULL, so when we query for processor detection, for example here:https://github.com/eclipse/omr/blob/3d85fbdf9c73e46c6da5df8989c119164711b368/compiler/x/env/OMRCPU.cpp#L244-L256

We default down to using the old API.
Deprecate the old processor detection code

When the port library is hooked up to the compiler the above queries should go away since the port library should never be NULL and the old processor detection APIs become dead at that point. We need to clean them up.

fjeremic commented 3 years ago

Just stumbled on eclipse/openj9#12670 where it seems we have duplicated code for processor detection in OpenJ9. We need to remove this as part of the migration effort here to avoid developers adding more code into the OpenJ9 versions.

@nbhuiyan wondering if you have a timeline to resume this work?

nbhuiyan commented 3 years ago

@fjeremic I plan on resuming this work later this week or early next week.

r30shah commented 2 years ago

While fixing a bug to make sure we do vectorize some of the vector API intrinsics for Float type (On Z, we need to have Vector Facility Enhancement-1 installed to generate vector instruction for float type), we found out that we do not set the _flags which were supposed to be checked when portLib is null in compiler [1][2]. While working on the change, I found out that we do not set the _flags on first place, so thought of cleaning up them but bumped into this issue.

I guess continuing discussion from Filip's comment, if we have hooked up the port library to compiler, we can clean-up the old checks so that we would not use them accidentally. Trying to find out in the code-base, if we ever had change for hooking up portlib to compiler but can not find. @0xdaryl @nbhuiyan did we finished working on remaining pieces of work for this issue and can the code-gen go ahead and clean-up old feature detection code?

[1]. https://github.com/eclipse/omr/blob/edf2ae5033cb412348dbeb74d31bab2d26fe933d/compiler/z/env/OMRCPU.cpp#L101-L102 [2]. https://github.com/eclipse/omr/blob/edf2ae5033cb412348dbeb74d31bab2d26fe933d/compiler/z/env/OMRCPU.cpp#L154

nbhuiyan commented 2 years ago

@r30shah I had the work to support the port library usage in the compiler completed a few months ago, but I was not able to get around to opening a PR for it yet. I am working on that and will update this issue soon.

r30shah commented 2 years ago

Thanks a lot @nbhuiyan , We will wait for your PR before cleaning up the old processor feature recognition code. Trying to understand so as for current state, in the function supportsFeature we are not guaranteed to have omrPortLib ? I am asking as we do not set the flags for old API anywhere in the code, and right now code in the same function does seem to call [1].

[1]. https://github.com/eclipse/omr/blob/066d80097f5c72c48456423b87cd495a7c2c8277/port/unix/omrsysinfo.c#L762

nbhuiyan commented 2 years ago

Currently while within the OMR Compiler, there is no guarantee that the port library is initialized. This is because OMR Compiler-based projects can initialize the compiler without the port library, which made it necessary to perform the check for the port library when used within OMR Compiler code. Given that the flags necessary for the old API is never set currently, supportsFeature was only returning the correct results if the port library was available.

eclipse / omr

Migrate processor and feature detection from OpenJ9 to OMR #4339

Background

Proposed Solution

Some notes about TR::Compiler->target.cpu, how and where it's initialized

Summary of the Processor Detection OMR Port Library Additions

How OpenJ9 initializes "TR::Compiler->target.cpu" for x86

jitGetCPUID() vs omrsysinfo_get_x86_description()

OMR's cpuid vs. OpenJ9's cpuid (x86)

if defined(WIN32)

elif defined(LINUX) || defined(OSX)

if defined(J9X86)

elif defined(J9HAMMER)

endif

endif

TR_Processor vs. TR_ProcessorDescription (x86)

OMRCodeGenerator with TR_ProcessorDescription

omrsysinfo_get_processor_description

The End Goal

Clean Up on Z

Latest Changes:

Summary of changes:

Old design

New design

Introduce Feature Masks

Some notes about `TR::Compiler->target.cpu`, how and where it's initialized

`jitGetCPUID()` vs `omrsysinfo_get_x86_description()`

OMR's `cpuid` vs. OpenJ9's `cpuid` (x86)

OMRCodeGenerator with `TR_ProcessorDescription`