Open fjeremic opened 4 years ago
@0xdaryl FYI. @AidanHa is this something you could tackle?
Sure, I can definitely look into this.
After reading through the issue and getting familiar with the code, I think I will divide this issue into 4 PRs. 1: Creation of processor and feature detection in OMR (from OpenJ9). 2: Implementation of the new feature throughout out OMR. 3: Implementation of the new feature throughout OpenJ9. 4: Deletion of the processor detection in OpenJ9 port library.
For #2 and #3, by "implementation of the feature" do you mean "exploitation of the feature" ? (for lack of a better word).
@0xdaryl that is correct, I couldn't find a better way to phrase that point.
Update for those following: I got a working version of processor detection locally on my x86 machine. Will test other architectures, and submit a PR when all is ready!
I was wondering how should we be testing feature detection? In my opinion, Openj9 currently has a fairly "weak" test for feature detection (https://github.com/eclipse/openj9/blob/master/runtime/tests/port/si.c#L965-L971), and I would like to improve upon this if possible...
I was wondering how should we be testing feature detection? In my opinion, Openj9 currently has a fairly "weak" test for feature detection (eclipse/openj9:runtime/tests/port/si.c@
master
#L965-L971
On Linux we can likely query /proc/cpuinfo
and parse the string and compare it against what we programmatically detect?
On Linux we can likely query /proc/cpuinfo and parse the string and compare it against what we programmatically detect?
This might have some issues where the user doesn't have permission to view the output of /proc/cpuinfo
, and all that will show up is "permission denied".
IMO, it's very difficult to actually precisely verify if all the known features (for a specific processor) are set...
IMO, it's very difficult to actually precisely verify if all the known features (for a specific processor) are set...
I would agree. We'll just have to be very diligent when manually testing that this was done right. i.e. hop on a bunch of different machines and validate before/after. Perhaps using OpenJ9 may help here since processor detection is known to work there already.
I am currently on step 2 of resolving this issue: Implementing the port library function throughout OMR and unifying the different API calls for processor detection. One thing I noticed is that the current processor detection in compiler/env/OMRCPU.hpp
(TR::Compiler>target.cpu.setProcessor() and TR::Compiler>target.cpu.id()
is currently being used downstream in OpenJ9 (Eg: https://github.com/eclipse/openj9/blob/master/runtime/compiler/env/ProcessorDetection.cpp#L696), which makes it difficult to simply remove the current implementation and replace it with the port library version. To work my way around this issue, I've decided to go with the following:
1: In the CPU
class in compiler/env/OMRCPU.hpp
, add separate processor detection calls that utilize the new port library functions at https://github.com/eclipse/omr/pull/4503.
2: In OpenJ9, reconfigure all of the instances where the old processor detection functions were use to utilize the new port library functions.
3: Safely delete the old processor detection functions from OMR.
Let me know if this works!
For clarity:
"old processor detection functions" = TR::Compiler>target.cpu.setProcessor()
.
"new processor detection functions" = the new port library functions.
Let me know if this works!
Sounds like a good plan.
Went through the above discussions:
Here's my understanding:
In OMR we have
env/OMRCPU
z/env/OMRCPU
p/env/OMRCPU
x/env/OMRCPU
In OpenJ9 we have
env/J9CPU
z/env/J9CPU
p/env/J9CPU
x/env/J9CPU
compiler/env/ProcessorDetection.cpp
TR_J9VM::initializeProcessorType()
as an exampleProcessorDetection.cpp
relies on a bunch of openj9 port library functions. initializeProcessorType()
during the onLoadInternal()
routine. Where should we do that in OMR?Last Week's Meeting:
TR_X86ProcessorInfo
which is in OMRCodeGenerator.hpp
to x/OMRCPU.hpp
in order to make the API calls for X somewhat consistent with Z and P. (to all use TR::Compiler->target.cpu
, right now X's feature queries are in the codegen class instead of the cpu class).@fjeremic @dsouzai Let me know if I understood everything correctly, I want to make sure everything is clear before I proceed to implementation. Thanks!
@mpirvu FYI, this is the processor feature detection work for OMR that I'll be working on for a while, I will likely log all my progress in this issue.
^ a question I have with that is in OpenJ9 we call initializeProcessorType() during the onLoadInternal() routine. Where should we do that in OMR?
The initialization should happen in the constructor of the OMRCPU
class, i.e. calling the port library and caching the result if needed.
Had a high priority JITServer item last week, back to this work item now.
TR::Compiler->target.cpu
, how and where it's initializedTR::Compiler
is a global pointer of type TR::CompilerEnv
TR::CompilerEnv
(from CompilerEnv.hpp) is a child of TR::OMR::CompilerEnv
(from OMRCompilerEnv.hpp) through extensible classes
namespace OMR
{
class OMR_EXTENSIBLE CompilerEnv
{
public:
CompilerEnv(TR::RawAllocator raw, const TR::PersistentAllocatorKit &persistentAllocatorKit);
TR::CompilerEnv *self();
...
// Compilation target environment
//
TR::Environment target;
target
is a TR::Environment
object.TR::Environment
(from Environment.hpp) is a child of TR::OMR::Environment
(from OMREnvironment.hpp)
namespace OMR
{
class Environment
{
public:
Environment() :
_majorOS(TR::os_unknown),
_bitness(TR::bits_unknown),
_isSMP(false),
_numberOfProcessors(1),
cpu()
{}
Environment(TR::MajorOperatingSystem o, TR::Bitness b) :
_majorOS(o),
_bitness(b),
_isSMP(false),
_numberOfProcessors(1),
cpu()
{}
TR::CPU cpu;
cpu
is a TR::CPU
objectTR::CPU
is a child of TR::OMR::X86::CPU
(from x/env/OMRCPU.hpp if we are on x) which is a child of TR::OMR::CPU
(from env/OMRCPU.hpp) through extensible classesTR::OMR::CPU
there is an API initializeByHostQuery()
initializeByHostQuery()
in TR::OMR::X86::CPU
namespace OMR
{
class CPU
{
protected:
CPU() :
_processor(TR_NullProcessor),
_endianness(TR::endian_unknown),
_majorArch(TR::arch_unknown),
_minorArch(TR::m_arch_none)
{}
public:
TR::CPU *self();
// Initialize CPU info by querying the host processor at compile-time
//
void initializeByHostQuery();
void
OMR::CPU::initializeByHostQuery()
{
#ifdef OMR_ENV_LITTLE_ENDIAN
_endianness = TR::endian_little;
#else
_endianness = TR::endian_big;
#endif
// Validate host endianness #define with an additional compile-time test
//
char SwapTest[2] = { 1, 0 };
bool isHostLittleEndian = (*(short *)SwapTest == 1);
if (_endianness == TR::endian_little)
{
TR_ASSERT(isHostLittleEndian, "expecting host to be little endian");
}
else
{
TR_ASSERT(!isHostLittleEndian, "expecting host to be big endian");
}
#if defined(TR_HOST_X86)
_majorArch = TR::arch_x86;
#if defined (TR_HOST_64BIT)
_minorArch = TR::m_arch_amd64;
#else
_minorArch = TR::m_arch_i386;
#endif
#elif defined (TR_HOST_POWER)
_majorArch = TR::arch_power;
#elif defined(TR_HOST_ARM)
_majorArch = TR::arch_arm;
#elif defined(TR_HOST_S390)
_majorArch = TR::arch_z;
#elif defined(TR_HOST_ARM64)
_majorArch = TR::arch_arm64;
#elif defined(TR_HOST_RISCV)
_majorArch = TR::arch_riscv;
#if defined (TR_HOST_64BIT)
_minorArch = TR::m_arch_riscv64;
#else
_minorArch = TR::m_arch_riscv32;
#endif
#else
TR_ASSERT(0, "unknown host architecture");
_majorArch = TR::arch_unknown;
#endif
}
initializeByHostQuery()
is called in OMR::CompilerEnv::initializeTargetEnvironment()
void
OMR::CompilerEnv::initializeTargetEnvironment()
{
// Target processor bitness
//
#ifdef TR_TARGET_64BIT
target.setBitness(TR::bits_64);
#elif TR_TARGET_32BIT
target.setBitness(TR::bits_32);
#else
target.setBitness(TR::bits_unknown);
#endif
// Initialize the target CPU by querying the host processor
//
target.cpu.initializeByHostQuery();
// Target major operating system
//
#if HOST_OS == OMR_LINUX
target.setMajorOS(TR::os_linux);
#elif HOST_OS == OMR_AIX
target.setMajorOS(TR::os_aix);
#elif HOST_OS == OMR_WINDOWS
target.setMajorOS(TR::os_windows);
#elif HOST_OS == OMR_ZOS
target.setMajorOS(TR::os_zos);
#elif HOST_OS == OMR_OSX
target.setMajorOS(TR::os_osx);
#else
target.setMajorOS(TR::os_unknown);
#endif
}
Let's be consistent with other platforms. If an object can be fully initialized at the allocation point we should initialize the object in the Constructor and simplify the callers by not having to call an additional API which they may or may not be aware of.
port/common/omrport.c
omrsysinfo_get_processor_description()
and omrsysinfo_processor_has_feature()
into OMRPortLibrary MasterPortLibraryTable
include_core/omrport.h
enum OMRProcessorArchitecture
struct OMRProcessorDesc
(comments still contain j9, needs to be corrected, j9sysinfo_get_processor_description, j9sysinfo_processor_has_feature)omrsysinfo_get_processor_description
and omrsysinfo_processor_has_feature
function pointers to struct OMRPortLibrary
omrsysinfo_get_processor_description
and omrsysinfo_processor_has_feature
macrosport/common/omrport.tdf
Trc_PRT_sysinfo_get_processor_description_Entered
, Group = j9sysinfo, this needs to be corrected (get rid of j9)Trc_PRT_sysinfo_get_processor_description_Exit
, Group = j9sysinfo, this needs to be corrected (get rid of j9)Trc_PRT_sysinfo_processor_has_feature_Entered
Trc_PRT_sysinfo_processor_has_feature_Exit
port/common/omrsysinfo.c
and omrsysinfo.c
omrsysinfo_get_processor_description()
omrsysinfo_processor_has_feature()
port/common/omrsysinfo_helpers.c
and omrsysinfo_helpers.h
getX86Description()
getX86CPUID()
port/linuxs390/omrsysinfo_helpers.c
and omrsysinfo_helpers.h
getstfle()
port/unix/omrsysinfo.c
mapPPCProcessor()
setFeature()
getLinuxPPCDescription()
getAIXPPCDescription()
testSTEFLE()
getS390Description()
omrsysinfo_get_processor_descriptions()
omrsysinfo_processor_has_feature()
port/unix_include/omrportpg.h
OMRSTFLEFacilities
OMRSTFLECache
PPG_stfleCache
port/win32/omrsysinfo.c
omrsysinfo_get_processor_description()
omrsysinfo_processor_has_feature()
port/zos390/omrportpg.h
OMRSTFLEFacilities
OMRSTFLECache
PPG_stfleCache
top level, we are using the vm/frontend API initializeProcessorType()
else if (TR::Compiler->target.cpu.isX86())
{
const char *vendor = TR::Compiler->target.cpu.getProcessorVendorId();
uint32_t processorSignature = TR::Compiler->target.cpu.getProcessorSignature();
TR::Compiler->target.cpu.setProcessor(portLibCall_getX86ProcessorType(vendor, processorSignature));
TR_ASSERT(TR::Compiler->target.cpu.id() >= TR_FirstX86Processor
&& TR::Compiler->target.cpu.id() <= TR_LastX86Processor, "Not a valid X86 Processor Type");
}
portLibCall_getX86ProcessorType()
is not a very special call, just parses the vendor and processorSignature to get the processor type.
the more special call is getProcessorSignature()
, which is able to get us feature flags on x86
it seems that, unlike Power and Z, X doesn't use j9sysinfo_get_processor_description()
and j9sysinfo_processor_has_feature()
to get its features, instead it uses queryX86TargetCPUID()
.
there are two versions of queryX86TargetCPUID()
, one in x/env/OMRCPU.hpp
(from omr) and one in x/env/J9CPU.hpp
(from openj9). They are very similar in that they both call jitGetCPUID()
openj9's version
TR_X86CPUIDBuffer *
CPU::queryX86TargetCPUID()
{
static TR_X86CPUIDBuffer buf = { {'U','n','k','n','o','w','n','B','r','a','n','d'} };
jitGetCPUID(&buf);
return &buf;
}
omr's version
TR_X86CPUIDBuffer *
OMR::X86::CPU::queryX86TargetCPUID()
{
static TR_X86CPUIDBuffer *buf = NULL;
auto jitConfig = TR::JitConfig::instance();
if (jitConfig && jitConfig->getProcessorInfo() == NULL)
{
buf = (TR_X86CPUIDBuffer *) malloc(sizeof(TR_X86CPUIDBuffer));
if (!buf)
return 0;
jitGetCPUID(buf);
jitConfig->setProcessorInfo(buf);
}
else
{
if (!buf)
{
if (jitConfig && jitConfig->getProcessorInfo())
{
buf = (TR_X86CPUIDBuffer *)jitConfig->getProcessorInfo();
}
else
{
buf = (TR_X86CPUIDBuffer *) malloc(sizeof(TR_X86CPUIDBuffer));
if (!buf)
memcpy(buf->_vendorId, "Poughkeepsie", 12); // 12 character dummy string (NIL term not used)
buf->_processorSignature = 0;
buf->_brandIdEtc = 0;
buf->_featureFlags = 0x00000000;
buf->_cacheDescription.l1instr = 0;
buf->_cacheDescription.l1data = 0;
buf->_cacheDescription.l2 = 0;
buf->_cacheDescription.l3 = 0;
}
}
}
return buf;
}
the omr version seems unnecessarily complicated for such a straightforward task
it looks like it's trying to cache the results in jitConfig
, but if we cache the result in the CPU class, then it's probably not necessary.
also this query means that if we don't have a jitConfig
then we can't get the results, which doesn't make much sense either
also getProcessorInfo()
is not used anywhere else except in this function
can we replace this with the OpenJ9 version? @fjeremic
another question, which is about the extensible classes:
OMR::CPU
--> OMR::X86::CPU
--> J9::CPU
--> J9::X86::CPU
--> TR::CPU
The TR::CompilerEnv
pointer will be of TR::CPU
type
TR::CPU *
OMR::CPU::self()
{
return static_cast<TR::CPU*>(this);
}
when we do self()->queryX86TargetCPUID()
, which one are we actually invoking? the one closer to TR::CPU
?
- when we do
self()->queryX86TargetCPUID()
, which one are we actually invoking? the one closer toTR::CPU
?
This question I can answer it myself. self()
returns a pointer to the most derived/specialized class, which means we will always invoke the functions "closer" to the most specialized class.
Dynamic polymorphism: we have a base pointer, then rely on the virtual
keyword to find the most specialized function.
Static polymorphism (what's used here): we will always get the most specialized pointer through self()
can we replace this with the OpenJ9 version? @fjeremic
Yes we can. Ideally as you mentioned earlier where we want to arrive at the end is for this processor detection to be migrated into the port library (sysinfo_get_processor_description
) and x86 would query that, similarly to other platforms.
jitGetCPUID()
vs omrsysinfo_get_x86_description()
jitGetCPUID()
returns TR_X86CPUIDBuffer
struct TR_X86CPUIDBuffer
{
char _vendorId[12]; // not NULL-terminated
uint32_t _processorSignature;
uint32_t _brandIdEtc;
uint32_t _featureFlags;
uint32_t _featureFlags2;
struct
{
uint32_t l1instr;
uint32_t l1data;
uint32_t l2;
uint32_t l3;
} _cacheDescription;
uint32_t _featureFlags8;
};
omrsysinfo_get_x86_description()
returns OMRProcessorDesc
#define OMRPORT_SYSINFO_FEATURES_SIZE 5
typedef struct OMRProcessorDesc {
OMRProcessorArchitecture processor;
OMRProcessorArchitecture physicalProcessor;
uint32_t features[OMRPORT_SYSINFO_FEATURES_SIZE];
} OMRProcessorDesc;
omrsysinfo_get_x86_description()
only initializes 2 out of 3 x86 feature flags
/* features */
desc->features[0] = CPUInfo[3]; // equivalent to jitGetCPUID's pBuffer->_featureFlags = CPUInfo[EDX];
desc->features[1] = CPUInfo[2]; // equivalent to jitGetCPUID's pBuffer->_featureFlags2 = CPUInfo[ECX];
desc->features[2] = 0; /* reserved for future expansion */
jitGetCPUID()
initializes all 3 of them
if (CPUInfo[EAX] > 0)
{
// EAX = 1
cpuid(CPUInfo, 1);
pBuffer->_processorSignature = CPUInfo[EAX];
pBuffer->_brandIdEtc = CPUInfo[EBX];
pBuffer->_featureFlags = CPUInfo[EDX];
pBuffer->_featureFlags2 = CPUInfo[ECX];
// EAX = 7, ECX = 0
cpuidex(CPUInfo, 7, 0);
pBuffer->_featureFlags8 = CPUInfo[EBX];
// Check for XSAVE
if(pBuffer->_featureFlags2 & TR_OSXSAVE)
{
if(((6 & _xgetbv(0)) != 6) || feGetEnv("TR_DisableAVX")) // '6' = mask for XCR0[2:1]='11b' (XMM state and YMM state are enabled)
{
// Unset OSXSAVE if not enabled via CR0
pBuffer->_featureFlags2 &= ~TR_OSXSAVE;
}
}
omrsysinfo_get_x86_description()
, it uses cpuidex
to get _featureFlags8
jitGetCPUID()
's _cacheDescription
and _brandIdEtc
are not used anywhere in OpenJ9, so I'm not sure if we need to port them over to omrsysinfo_get_x86_description()
.cpuid
vs. OpenJ9's cpuid
(x86)
/**
* Assembly code to get the register data from CPUID instruction
* This function executes the CPUID instruction based on which we can detect
* if the environment is virtualized or not, and also get the Hypervisor Vendor
* Name based on the same instruction. The leaf value specifies what information
* to return.
*
* @param[in] leaf The leaf value to the CPUID instruction.
* @param[out] cpuInfo Reference to the an integer array which holds the data
* of EAX,EBX,ECX and EDX registers.
* cpuInfo[0] To hold the EAX register data, value in this register at
* the time of CPUID tells what information to return
* EAX=0x1,returns the processor Info and feature bits
* in EBX,ECX,EDX registers.
* EAX=0x40000000 returns the Hypervisor Vendor Names
* in the EBX,ECX,EDX registers.
* cpuInfo[1] For EAX = 0x40000000 hold first 4 characters of the
* Hypervisor Vendor String
* cpuInfo[2] For EAX = 0x1, the 31st bit of ECX tells if its
* running on Hypervisor or not,For EAX = 0x40000000 holds the second
* 4 characters of the the Hypervisor Vendor String
* cpuInfo[3] For EAX = 0x40000000 hold the last 4 characters of the
* Hypervisor Vendor String
*
*/
void
getX86CPUID(uint32_t leaf, uint32_t *cpuInfo)
{
cpuInfo[0] = leaf;
/ Implemented for x86 & x86_64 bit platforms /
/* Specific CPUID instruction available in Windows */
__cpuid(cpuInfo, cpuInfo[0]);
__asm volatile
("mov %%ebx, %%edi;"
"cpuid;"
"mov %%ebx, %%esi;"
"mov %%edi, %%ebx;"
:"+a" (cpuInfo[0]), "=S" (cpuInfo[1]), "=c" (cpuInfo[2]), "=d" (cpuInfo[3])
: :"edi");
__asm volatile( "cpuid;" :"+a" (cpuInfo[0]), "=b" (cpuInfo[1]), "=c" (cpuInfo[2]), "=d" (cpuInfo[3]) );
}
- Though I think what's actually being used at the moment on x86 is this:
```cpp
#if defined(OMR_OS_WINDOWS)
#include <intrin.h>
#define cpuid(CPUInfo, EAXValue) __cpuid(CPUInfo, EAXValue)
#define cpuidex(CPUInfo, EAXValue, ECXValue) __cpuidex(CPUInfo, EAXValue, ECXValue)
#else
#include <cpuid.h>
#include <emmintrin.h>
#define cpuid(CPUInfo, EAXValue) __cpuid(EAXValue, CPUInfo[0], CPUInfo[1], CPUInfo[2], CPUInfo[3])
#define cpuidex(CPUInfo, EAXValue, ECXValue) __cpuid_count(EAXValue, ECXValue, CPUInfo[0], CPUInfo[1], CPUInfo[2], CPUInfo[3])
Which one do you think is preferred? Wouldn't the second one look better? @fjeremic
related changes in the past:
I'm looking to eliminate TR_ProcessorDescription, which is used by OpenJ9 x86 codegen and serves the same purpose as TR_Processor. However, we do need to check for differences.
TR_ProcessorDescription
appears to have a few more types compared to TR_Processor
enum TR_Processor
{
TR_NullProcessor,
TR_FirstProcessor,
// ... I have taken out the other platforms
// X86 Processors
TR_FirstX86Processor,
TR_DefaultX86Processor = TR_FirstX86Processor,
TR_X86ProcessorIntelPentium,
TR_X86ProcessorIntelP6,
TR_X86ProcessorIntelPentium4,
TR_X86ProcessorIntelCore2,
TR_X86ProcessorIntelTulsa,
TR_X86ProcessorAMDK5,
TR_X86ProcessorAMDK6,
TR_X86ProcessorAMDAthlonDuron,
TR_LastX86Processor,
TR_X86ProcessorAMDOpteron = TR_LastX86Processor,
TR_LastProcessor = TR_LastX86Processor,
TR_NumberOfProcessors
};
enum TR_ProcessorDescription
{
TR_ProcessorUnknown = 0x00000000,
TR_ProcessorIntelPentium = 0x00000001,
TR_ProcessorIntelP6 = 0x00000002,
TR_ProcessorIntelPentium4 = 0x00000003,
TR_ProcessorAMDK5 = 0x00000004,
TR_ProcessorAMDK6 = 0x00000005,
TR_ProcessorAMDAthlonDuron = 0x00000006,
TR_ProcessorAMDOpteron = 0x00000007,
TR_ProcessorIntelCore2 = 0x00000008,
TR_ProcessorIntelTulsa = 0x00000009,
TR_ProcessorIntelNehalem = 0x0000000a,
TR_ProcessorAMDFamily15h = 0x0000000b,
TR_ProcessorIntelWestmere = 0x0000000c,
TR_ProcessorIntelSandyBridge = 0x0000000d,
TR_ProcessorIntelIvyBridge = 0x0000000e,
TR_ProcessorIntelHaswell = 0x0000000f,
TR_ProcessorIntelBroadwell = 0x00000010,
TR_ProcessorIntelSkylake = 0x00000011,
};
we may need to port over the missing types from TR_ProcessorDescription
to TR_Processor
TR_Processor
mapOMRProcessor(OMRProcessorArchitecture processor)
{
case PROCESSOR_X86_UNKNOWN:
tp = TR_DefaultX86Processor;
break;
case PROCESSOR_X86_INTELPENTIUM:
tp = TR_X86ProcessorIntelPentium;
break;
case PROCESSOR_X86_INTELP6:
tp = TR_X86ProcessorIntelP6;
break;
case PROCESSOR_X86_INTELPENTIUM4:
tp = TR_X86ProcessorIntelPentium4;
break;
case PROCESSOR_X86_INTELCORE2:
tp = TR_X86ProcessorIntelCore2;
break;
case PROCESSOR_X86_INTELTULSA:
tp = TR_X86ProcessorIntelTulsa;
break;
case PROCESSOR_X86_AMDK5:
tp = TR_X86ProcessorAMDK5;
break;
case PROCESSOR_X86_AMDK6:
tp = TR_X86ProcessorAMDK6;
break;
case PROCESSOR_X86_AMDATHLONDURON:
tp = TR_X86ProcessorAMDAthlonDuron;
break;
case PROCESSOR_X86_AMDOPTERON:
tp = TR_X86ProcessorAMDOpteron;
break;
case PROCESSOR_X86_INTELNEHALEM: //Not yet mapped in TR_Processor.
case PROCESSOR_X86_INTELWESTMERE:
case PROCESSOR_X86_INTELSANDYBRIDGE:
case PROCESSOR_X86_INTELHASWELL:
tp = TR_X86ProcessorAMDOpteron;
break;
default:
return tp;
}
TR_ProcessorDescription
uint32_t _processorSignature = cg->comp()->target().cpu.getX86ProcessorSignature();
if (isGenuineIntel())
{
switch (getCPUFamily(_processorSignature))
{
case 0x05: _processorDescription |= TR_ProcessorIntelPentium; break;
case 0x06:
{
uint32_t extended_model = getCPUModel(_processorSignature) + (getCPUExtendedModel(_processorSignature) << 4);
switch (extended_model)
{
case 0x55:
_processorDescription |= TR_ProcessorIntelSkylake; break;
case 0x4f:
_processorDescription |= TR_ProcessorIntelBroadwell; break;
case 0x3f:
case 0x3c:
_processorDescription |= TR_ProcessorIntelHaswell; break;
case 0x3e:
case 0x3a:
_processorDescription |= TR_ProcessorIntelIvyBridge; break;
case 0x2a:
case 0x2d: // SandyBridge EP
_processorDescription |= TR_ProcessorIntelSandyBridge; break;
case 0x2c: // WestmereEP
case 0x2f: // WestmereEX
_processorDescription |= TR_ProcessorIntelWestmere; break;
case 0x1a: // Nehalem
_processorDescription |= TR_ProcessorIntelNehalem; break;
case 0x17: // Harpertown
case 0x0f: // Woodcrest/Clovertown
_processorDescription |= TR_ProcessorIntelCore2; break;
default: _processorDescription |= TR_ProcessorIntelP6; break;
}
break;
}
case 0x0f: _processorDescription |= TR_ProcessorIntelPentium4; break;
default: _processorDescription |= TR_ProcessorUnknown; break;
}
}
else if (isAuthenticAMD())
{
switch (getCPUFamily(_processorSignature))
{
case 0x05:
if (getCPUModel(_processorSignature) < 0x04)
_processorDescription |= TR_ProcessorAMDK5;
else
_processorDescription |= TR_ProcessorAMDK6;
break;
case 0x06: _processorDescription |= TR_ProcessorAMDAthlonDuron; break;
case 0x0f:
if (getCPUExtendedFamily(_processorSignature) < 6)
_processorDescription |= TR_ProcessorAMDOpteron;
else
_processorDescription |= TR_ProcessorAMDFamily15h;
break;
default: _processorDescription |= TR_ProcessorUnknown; break;
}
}
intptr_t
omrsysinfo_get_x86_description(struct OMRPortLibrary *portLibrary, OMRProcessorDesc *desc)
{
uint32_t CPUInfo[4] = {0};
char vendor[12];
uint32_t familyCode = 0;
uint32_t processorSignature = 0;
desc->processor = OMR_PROCESSOR_X86_UNKNOWN;
/* vendor */
omrsysinfo_get_x86_cpuid(CPUID_VENDOR_INFO, CPUInfo);
memcpy(vendor + 0, &CPUInfo[1], sizeof(uint32_t));
memcpy(vendor + 4, &CPUInfo[3], sizeof(uint32_t));
memcpy(vendor + 8, &CPUInfo[2], sizeof(uint32_t));
/* family and model */
omrsysinfo_get_x86_cpuid(CPUID_FAMILY_INFO, CPUInfo);
processorSignature = CPUInfo[0];
familyCode = (processorSignature & CPUID_SIGNATURE_FAMILY) >> CPUID_SIGNATURE_FAMILY_SHIFT;
if (0 == strncmp(vendor, CPUID_VENDOR_INTEL, CPUID_VENDOR_LENGTH)) {
switch (familyCode) {
case CPUID_FAMILYCODE_INTELPENTIUM:
desc->processor = OMR_PROCESSOR_X86_INTELPENTIUM;
break;
case CPUID_FAMILYCODE_INTELCORE:
{
uint32_t modelCode = (processorSignature & CPUID_SIGNATURE_MODEL) >> CPUID_SIGNATURE_MODEL_SHIFT;
uint32_t extendedModelCode = (processorSignature & CPUID_SIGNATURE_EXTENDEDMODEL) >> CPUID_SIGNATURE_EXTENDEDMODEL_SHIFT;
uint32_t totalModelCode = modelCode + extendedModelCode;
if (totalModelCode > CPUID_MODELCODE_INTELHASWELL) {
desc->processor = OMR_PROCESSOR_X86_INTELHASWELL;
} else if (totalModelCode >= CPUID_MODELCODE_SANDYBRIDGE) {
desc->processor = OMR_PROCESSOR_X86_INTELSANDYBRIDGE;
} else if (totalModelCode >= CPUID_MODELCODE_INTELWESTMERE) {
desc->processor = OMR_PROCESSOR_X86_INTELWESTMERE;
} else if (totalModelCode >= CPUID_MODELCODE_INTELNEHALEM) {
desc->processor = OMR_PROCESSOR_X86_INTELNEHALEM;
} else if (totalModelCode == CPUID_MODELCODE_INTELCORE2) {
desc->processor = OMR_PROCESSOR_X86_INTELCORE2;
} else {
desc->processor = OMR_PROCESSOR_X86_INTELP6;
}
break;
}
case CPUID_FAMILYCODE_INTELPENTIUM4:
desc->processor = OMR_PROCESSOR_X86_INTELPENTIUM4;
break;
}
} else if (0 == strncmp(vendor, CPUID_VENDOR_AMD, CPUID_VENDOR_LENGTH)) {
switch (familyCode) {
case CPUID_FAMILYCODE_AMDKSERIES:
{
uint32_t modelCode = (processorSignature & CPUID_SIGNATURE_FAMILY) >> CPUID_SIGNATURE_MODEL_SHIFT;
if (modelCode < CPUID_MODELCODE_AMDK5) {
desc->processor = OMR_PROCESSOR_X86_AMDK5;
}
desc->processor = OMR_PROCESSOR_X86_AMDK6;
break;
}
case CPUID_FAMILYCODE_AMDATHLON:
desc->processor = OMR_PROCESSOR_X86_AMDATHLONDURON;
break;
case CPUID_FAMILYCODE_AMDOPTERON:
desc->processor = OMR_PROCESSOR_X86_AMDOPTERON;
break;
}
}
desc->physicalProcessor = desc->processor;
/* features */
desc->features[0] = CPUInfo[3];
desc->features[1] = CPUInfo[2];
desc->features[2] = 0; /* reserved for future expansion */
return 0;
}
The above 3 posts suggest that omrsysinfo_get_x86_description()
is lacking:
I'm going to open a separate PR to handle this task.
@fjeremic Do you think it's better for x86 to go with the Power/Z way for feature query and use j9sysinfo_processor_has_feature()
(with J9ProcessorDesc
cached)
or keep the original way? so in the latter case we would get the feature flags from j9sysinfo_get_processor_description
and cache them in flags32_t
and query using testAny()
. The original way looks nicer, but we would have redundant things such as enum TR_X86ProcessorFeatures
, which are already defined by the port library. The Power/Z way looks like old c code but it gets the job done.
enum TR_X86ProcessorFeatures
{
TR_BuiltInFPU = 0x00000001,
TR_VirtualModeExtension = 0x00000002,
TR_DebuggingExtension = 0x00000004,
TR_PageSizeExtension = 0x00000008,
TR_RDTSCInstruction = 0x00000010,
TR_ModelSpecificRegisters = 0x00000020,
TR_PhysicalAddressExtension = 0x00000040,
TR_MachineCheckException = 0x00000080,
TR_CMPXCHG8BInstruction = 0x00000100,
TR_APICHardware = 0x00000200,
// Reserved by Intel = 0x00000400,
TR_FastSystemCalls = 0x00000800,
TR_MemoryTypeRangeRegisters = 0x00001000,
TR_PageGlobalFlag = 0x00002000,
TR_MachineCheckArchitecture = 0x00004000,
TR_CMOVInstructions = 0x00008000,
TR_PageAttributeTable = 0x00010000,
TR_36BitPageSizeExtension = 0x00020000,
TR_ProcessorSerialNumber = 0x00040000, // GenuineIntel only
TR_CLFLUSHInstruction = 0x00080000, // GenuineIntel only
// Reserved by Intel = 0x00100000,
TR_DebugTraceStore = 0x00200000, // GenuineIntel only
TR_ACPIRegisters = 0x00400000, // GenuineIntel only
TR_MMXInstructions = 0x00800000,
TR_FastFPSavesRestores = 0x01000000,
TR_SSE = 0x02000000,
TR_SSE2 = 0x04000000, // GenuineIntel Only
TR_SelfSnoop = 0x08000000, // GenuineIntel Only
TR_HyperThreading = 0x10000000, // GenuineIntel Only
TR_ThermalMonitor = 0x20000000, // GenuineIntel Only
// Reserved by Intel = 0x40000000, // IA64-specific
// Reserved by Intel = 0x80000000
TR_X86ProcessorInfoInitialized = 0x80000000 // FIXME: Using a reserved bit for our purposes.
};
bool
CPU::getPPCSupportsVMX()
{
#if defined(J9OS_I5) && defined(J9OS_I5_V5R4)
return FALSE;
#else
J9ProcessorDesc *processorDesc = TR::Compiler->target.cpu.TO_PORTLIB_getJ9ProcessorDesc();
J9PortLibrary *privatePortLibrary = TR::Compiler->portLib;
BOOLEAN feature = j9sysinfo_processor_has_feature(processorDesc, J9PORT_PPC_FEATURE_HAS_ALTIVEC);
return (TRUE == feature);
#endif
}
How x86 handles feature query
bool hasBuiltInFPU() {return testFeatureFlags(TR_BuiltInFPU);}
bool testFlag(flags32_t &flag, uint32_t feature, uint32_t mask)
{
TR_ASSERT_FATAL(feature & mask, "The %x feature needs to be added to the "
"getFeatureFlagsMask (or variant) function "
"for correctness in relocatable compiles!\n",
feature);
return flag.testAny(feature);
}
/**
* @brief testFeatureFlags Wrapper around testFlag
*
* @param feature The feature being tested for
*
* @return The result of testFlag
*/
bool testFeatureFlags(uint32_t feature)
{
return testFlag(_featureFlags, feature, getFeatureFlagsMask());
}
@harryyu1994 ideally the feature flags (which are enum values on x86) should live in the port library. There is no need for us to be duplicating them in the compiler. As such it would be preferable to do it the Power way, i.e. use the J9PORT_*_FEATURE_*
flag which is defined in the port library to query the feature in the compiler. This removes any indirection and all duplication and there is a single point of definition for a particular processor feature, which lives in the port library which is accessible by all OMR components.
Looked into Z/CPU:
Unlike X, to determine whether a feature is enabled, it needs two pieces of information (arch
and feature
), as an example:
// z9 supports DFP in millicode so do not check for DFP support unless we are z10+
if (TR::Compiler->target.cpu.getSupportsArch(TR::CPU::z10) &&
j9sysinfo_processor_has_feature(processorDesc, J9PORT_S390_FEATURE_DFP))
{
TR::Compiler->target.cpu.setSupportsDecimalFloatingPointFacility(true);
}
This creates some compilations if we want to use just the singular cpu.supportsFeature(YOUR_FEATURE)
API for all the Z feature queries.
I see 3 options
getSupportsDecimalFloatingPointFacility()
), IMO the worst option out of the 3Served as a reference, the following is the wrapper to port library call omrsysinfo_get_processor_description()
OMRProcessorDesc*
CPU::TO_PORTLIB_getOMRProcessorDesc()
{
static int processorTypeInitialized = 0;
static OMRProcessorDesc processorDesc;
if(processorTypeInitialized == 0)
{
PORT_ACCESS_FROM_PORT(TR::Compiler->portLib);
//Just make sure we initialize the OMRProcessorDesc at least once.
omrsysinfo_get_processor_description(&processorDesc);
processorTypeInitialized = 1;
}
return &processorDesc;
}
@fjeremic Again, need your input on this
@fjeremic Again, need your input on this
Option c. seems to be the most reasonable thing to do. This way the compiler doesn't muck with the port library true source of information.
supportsProcessor()
is needed for Power and Z as a lot of their queries need both processorType and feature.
bool supportsFeature(uint32_t feature);
bool supportsProcessor(OMRProcessorArchitecture processor);
enum Architecture
{
Unknown = 0,
z900,
z990,
z9,
z10,
z196,
zEC12,
z13,
z14,
z15,
zNext,
};
supportsProcessor()
on Z, I would like to get rid of its private enum Architecture
and use Port Library's processor architecture insteadHere's the mapping (Port Library Arch vs. Z::CPU::Architecture) that I've determined:
typedef enum OMRProcessorArchitecture {
OMR_PROCESSOR_UNDEFINED,
OMR_PROCESSOR_S390_UNKNOWN,
OMR_PROCESSOR_S390_GP6 --> TO_PORTLIB_get390_supportsZ6() --> TR::CPU::z10
OMR_PROCESSOR_S390_GP7,
OMR_PROCESSOR_S390_GP8,
OMR_PROCESSOR_S390_GP9 --> TO_PORTLIB_get390_supportsZGryphon() --> TR::CPU::z196
OMR_PROCESSOR_S390_GP10 --> TO_PORTLIB_get390_supportsZHelix() --> TR::CPU::zEC12
OMR_PROCESSOR_S390_GP11 --> TO_PORTLIB_get390_supportsZ13() --> TR::CPU::z13
OMR_PROCESSOR_S390_GP12 --> TO_PORTLIB_get390_supportsZ14() --> TR::CPU::z14
OMR_PROCESSOR_S390_GP13 --> TO_PORTLIB_get390_supportsZ15() --> TR::CPU::z15
OMR_PROCESSOR_S390_GP14 --> TO_PORTLIB_get390_supportsZNext() --> TR::CPU::zNext
Z::CPU::Architecture
names are used much more frequently than the port library arch names.Z::CPU::Architecture
namesOMR_PROCESSOR_S390_GP9,
OMR_PROCESSOR_S390_Z196 = OMR_PROCESSOR_S390_GP9,
@fjeremic What do you think?
I'm going to keep both set of names for now.
I'm going to keep both set of names for now.
Keep both sets. We should aim to phase out the OMR_PROCESSOR_S390_GP*
enums eventually. They are not very descriptive.
I realized there could still be a bit of work after the port library initialization decision, I'm going to not wait for that decision and try to finish off this whole issue by initializing my own omr port library for now. Just realized it doesn't look that hard to get the omr port library when we have OpenJ9 and its port library in hand.
New API against Old API tests in OpenJ9: https://github.com/eclipse/openj9/pull/9295
CPU::getProcessorDescription()
to base J9CPU class as the 3 platforms (X, Z, Power) share the same code. There is no need to have 3 copiesTR_Processor
struct via j9port library
TR_J9VM::initializeProcessorType()
CPU::TO_PORTLIB_getJ9ProcessorDesc()
TR::Compiler->target.cpu.setProcessor
to set processor Class CPU
CPU::TO_PORTLIB_getJ9ProcessorDesc()
j9sysinfo_processor_has_feature(processorDesc, feature)
(j9port library API)CPU::id()
CPU::setProcessor(TR_Processor)
OMRProcessorDesc
struct via omrport library
CPU::detect()
to initialize OMRProcessorDesc processorDescription
CPU::applyUserOptions()
to apply any debug options after initializationTR::Compiler->target.cpu
singleton variable that holds the CPU instancecomp()->target().cpu
per compilation variable which is used for cross-cpu compilation supportcomp()->target().cpu
with TR::Compiler->target.cpu
(Host CPU)comp()->target().cpu
with TR::Compiler->relocatableTarget.cpu
(Portable CPU)Class CPU
CPU::supportsFeature(feature)
CPU::isAtLeast(OMRProcessorArchitecture)
applyUserOptions()
.
TR::CPU
. applyUserOptions()
is that the options are not initialized when TR::CPU
is initialized which means if the user calls applyUserOptions()
at the incorrect place the options queries that it contains will either crash the program or not take effect. I want to eliminate this risk of incorrect usage.TR::CPU::customize(OMRProcessorDesc processorDescription)
OMRProcessorDesc
from user and output a TR::CPU
objectTR::CPU::enableFeatureMasks()
TR::CPU::enableFeatureMasks()
TR::CPU::enableFeatureMasks()
TR::CPU::detect(omrPortLib)
: static factory method that gives you the TR::CPU object based on port libraryTR::CPU::customize(processorDescription)
: static factory method that gives you the TR::CPU object based on a processorDescription provided by user@fjeremic @dsouzai : There are a lot of detailed comments in this issue tracking the progress of this work (thanks @harryyu1994!). However, can one of you please provide a summary of what work remains to be done from its current state?
@nbhuiyan will return to the port library initialization problem soon (he's working on a PR) which will make the port lib accessible within OMR and should deprecate some of the duplicate code within OMR itself.
AFAIK the detection has been sunk into OMR and there are two main pieces still remaining to finish:
Hookup the OMR port library to the detection
For example we have the following code: https://github.com/eclipse/omr/blob/3d85fbdf9c73e46c6da5df8989c119164711b368/compiler/env/OMRCompilerEnv.cpp#L77-L79
Which detects the target CPU using the OMR port library. Unfortunately because the port library has not been hooked up to the compiler in OMR this value is NULL
, so when we query for processor detection, for example here:https://github.com/eclipse/omr/blob/3d85fbdf9c73e46c6da5df8989c119164711b368/compiler/x/env/OMRCPU.cpp#L244-L256
We default down to using the old API.
Deprecate the old processor detection code
When the port library is hooked up to the compiler the above queries should go away since the port library should never be NULL
and the old processor detection APIs become dead at that point. We need to clean them up.
Just stumbled on eclipse/openj9#12670 where it seems we have duplicated code for processor detection in OpenJ9. We need to remove this as part of the migration effort here to avoid developers adding more code into the OpenJ9 versions.
@nbhuiyan wondering if you have a timeline to resume this work?
@fjeremic I plan on resuming this work later this week or early next week.
While fixing a bug to make sure we do vectorize some of the vector API intrinsics for Float type (On Z, we need to have Vector Facility Enhancement-1
installed to generate vector instruction for float type), we found out that we do not set the _flags
which were supposed to be checked when portLib is null in compiler [1][2]. While working on the change, I found out that we do not set the _flags
on first place, so thought of cleaning up them but bumped into this issue.
I guess continuing discussion from Filip's comment, if we have hooked up the port library to compiler, we can clean-up the old checks so that we would not use them accidentally. Trying to find out in the code-base, if we ever had change for hooking up portlib to compiler but can not find. @0xdaryl @nbhuiyan did we finished working on remaining pieces of work for this issue and can the code-gen go ahead and clean-up old feature detection code?
[1]. https://github.com/eclipse/omr/blob/edf2ae5033cb412348dbeb74d31bab2d26fe933d/compiler/z/env/OMRCPU.cpp#L101-L102 [2]. https://github.com/eclipse/omr/blob/edf2ae5033cb412348dbeb74d31bab2d26fe933d/compiler/z/env/OMRCPU.cpp#L154
@r30shah I had the work to support the port library usage in the compiler completed a few months ago, but I was not able to get around to opening a PR for it yet. I am working on that and will update this issue soon.
Thanks a lot @nbhuiyan , We will wait for your PR before cleaning up the old processor feature recognition code. Trying to understand so as for current state, in the function supportsFeature we are not guaranteed to have omrPortLib
? I am asking as we do not set the flags for old API anywhere in the code, and right now code in the same function does seem to call [1].
Currently while within the OMR Compiler, there is no guarantee that the port library is initialized. This is because OMR Compiler-based projects can initialize the compiler without the port library, which made it necessary to perform the check for the port library when used within OMR Compiler code. Given that the flags necessary for the old API is never set currently, supportsFeature
was only returning the correct results if the port library was available.
Background
The CPU class in the compiler in OMR [1] currently has no way of determining the processor we are running on at runtime. The
OMR::CPU
class is an extensible class which is meant to be overriden by the target code generator. Inside of these extended codegen classes we have APIs to determine whether we support generating instructions for a particular processor. For example [2] is the override on Z and [3] is the override on Power.Each code generator seems to use different, although quite similar APIs to determine if we can generate instructions for a particular processor.
TR::Compiler->target.cpu.id()
TR::Compiler->target.cpu.getSupportsArch()
cg()->getX86ProcessorInfo()
As we can see we're somewhat inconsistent in the APIs we use. More over, if you notice the APIs we use across the different codegens will always default down to the minimum Architecture Level Set (ALS) defined in OMR. This means that irregardless of what processor we are actually running on we'll generate instructions for the least supported processor.
This is problematic in dynamic runtimes.
[1] https://github.com/eclipse/omr/blob/20db4fbc1ea570fd806783a97b16831f4f719eed/compiler/env/OMRCPU.cpp [2] https://github.com/eclipse/omr/blob/20db4fbc1ea570fd806783a97b16831f4f719eed/compiler/z/env/OMRCPU.cpp [3] https://github.com/eclipse/omr/blob/20db4fbc1ea570fd806783a97b16831f4f719eed/compiler/p/env/OMRCPU.cpp [4] https://github.com/eclipse/omr/blob/20db4fbc1ea570fd806783a97b16831f4f719eed/compiler/p/codegen/BinaryEvaluator.cpp#L2087-L2096 [5] https://github.com/eclipse/omr/blob/20db4fbc1ea570fd806783a97b16831f4f719eed/compiler/z/codegen/BinaryEvaluator.cpp#L3125-L3128 [6] https://github.com/eclipse/omr/blob/20db4fbc1ea570fd806783a97b16831f4f719eed/compiler/x/codegen/OMRCodeGenerator.cpp#L385-L388
Proposed Solution
Our friends at the OpenJ9 project have implemented processor detection in the port library already, and the JIT compiler in OpenJ9 is already making use of the data provided by the port library and caching the various answers for use in the JIT.
For example on Power and Z we use the
j9sysinfo_processor_has_feature
API [7] to query whether we have support for a particular CPU feature, which is initialized in the port library on initialization. Example of this is in thej9sysinfo.c
file for both Power and Z [8].Similarly processors are detected via the
J9ProcessorDesc
struct which is initialized via thej9sysinfo_get_processor_description
API which can be found in the samej9sysinfo.c
file. The way in which we initialize these structures depends on the OS and architecture type.What we effectively need to do is to sink the OpenJ9 pieces of the processor and feature detection down into the OMR layer so that we can access it directly from the OMR compiler. The work here will involve investigating what pieces of the OpenJ9 port library are used and finding the correct place to move the APIs in OMR. Processor and feature detection is a delicate piece of OpenJ9 and is crucial for performance so such a change must be staged, that is to first copy the processor detection into OMR, make use of it in OMR, only once that is working we can work on deprecating the OpenJ9 piece to take advantage of the OMR support.
[7] https://github.com/eclipse/openj9/blob/82609efc1fa05a5d1af51ad6ceed7215376dde84/runtime/compiler/z/env/J9CPU.cpp#L230-L301 [8] https://github.com/eclipse/openj9/blob/82609efc1fa05a5d1af51ad6ceed7215376dde84/runtime/port/unix/j9sysinfo.c