A random failure can occur when issuing the ARCHMODE and NUMCPU commands.
While running Hercules in a shell script, many times over-and-over, trying to track down a different issue, I started noticing these errors in the log, which later caused IPL to fail.
HHC02389E CPUs must be offline or stopped
or
HHC02253E All CPU's must be stopped to switch architectures
A simple Hercules .cnf can be used when attempting to reproduce this problem:
Depending on the host system's CPU architecture, OS, etc. this problem may trigger quickly, perhaps one out of ten tries (NetBSD on x86_64 and Sun UltraSPARC). Or it may refuse to ever act up (ARM based Raspberry Pi with Debian, and macOS on Apple M1 CPU). Windows and Debian on x86_64 fail fairly regularly for me.
After endless fiddling I did manage to get it to stop in the Visual Studio debugger (Windows 10 VM, VS2019) and noticed the two threads involved that are the "what" of the issue.
Worker Thread impl_thread hengine.dll!maxcpu_cmd
> hengine.dll!maxcpu_cmd(int argc, char * * argv, char * cmdline) Line 3811 C
hengine.dll!CallHercCmd(int argc, char * * argv, char * cmdline) Line 362 C
hengine.dll!process_config(const char * cfg_name) Line 424 C
hengine.dll!build_config(const char * hercules_cnf) Line 118 C
hengine.dll!impl(int argc, char * * argv) Line 1340 C
hercules.exe!main(int ac, char * * av) Line 305 C
Worker Thread Processor CP01 hutil.dll!LeaveFT_MUTEX
hutil.dll!LeaveFT_MUTEX(_tagFT_MUTEX * pFT_MUTEX) Line 292 C
hutil.dll!fthread_mutex_unlock(_tagFTU_MUTEX * pFTUSER_MUTEX) Line 1459 C
hutil.dll!hthread_release_lock(LOCK * plk, const char * release_loc) Line 545 C
hengine.dll!Release_Interrupt_Lock(REGS * regs, const char * location) Line 450 C
> hengine.dll!z900_run_cpu(int cpu, REGS * oldregs) Line 1996 C
hengine.dll!cpu_thread(void * ptr) Line 2355 C
hutil.dll!hthread_func(void * arg2) Line 1055 C
hutil.dll!FTWin32ThreadFunc(void * pMyArgs) Line 809 C
Some of the relevant code:
cpu.c:1926
memset(regs, 0, sizeof(REGS));
if (cpu_init (cpu, regs, NULL))
return NULL;
...
cpu.c:1991
RELEASE_INTLOCK(regs);
/* Establish longjmp destination for program check or
RETURN_INTCHECK, or SIE_INTERCEPT, or longjmp, etc.
*/
if (setjmp( regs->progjmp ) && sysblk.ipled)
{
---
in cpu_init( )
cpu.c:
if (!hostregs)
{
/* regs points to host regs */
regs->cpustate = CPUSTATE_STOPPING;
ON_IC_INTERRUPT(regs);
This bug affects Hercules going back at least 2 years in the git commit history.
I have reported this bug to Fish privately and worked with him to help reproduce it. I have tested his proposed fix, which is forthcoming.
A random failure can occur when issuing the ARCHMODE and NUMCPU commands.
While running Hercules in a shell script, many times over-and-over, trying to track down a different issue, I started noticing these errors in the log, which later caused IPL to fail.
HHC02389E CPUs must be offline or stopped
orHHC02253E All CPU's must be stopped to switch architectures
A simple Hercules .cnf can be used when attempting to reproduce this problem:
Depending on the host system's CPU architecture, OS, etc. this problem may trigger quickly, perhaps one out of ten tries (NetBSD on x86_64 and Sun UltraSPARC). Or it may refuse to ever act up (ARM based Raspberry Pi with Debian, and macOS on Apple M1 CPU). Windows and Debian on x86_64 fail fairly regularly for me.
After endless fiddling I did manage to get it to stop in the Visual Studio debugger (Windows 10 VM, VS2019) and noticed the two threads involved that are the "what" of the issue.
Some of the relevant code:
This bug affects Hercules going back at least 2 years in the git commit history.
I have reported this bug to Fish privately and worked with him to help reproduce it. I have tested his proposed fix, which is forthcoming.
Bill