SDL-Hercules-390 / hyperion

The SDL Hercules 4.x Hyperion version of the System/370, ESA/390, and z/Architecture Emulator
Other
240 stars 90 forks source link

DAT bug causes MVCOS instruction to CRASH Hercules #349

Closed Fish-Git closed 3 years ago

Fish-Git commented 3 years ago

This issue is closely related to Issue #350.


A user has reported that Hercules is crashing for him when the following jobstream is submitted to z/OS 2.2 or greater:

//MYJOB1   JOB (),'CAREFUL'                                             00040000
//  SCHEDULE STARTBY='01:00'                                            00050002
//NOTHING EXEC PGM=IEFBR14                                              00060000
//MYJOB2   JOB (),'CAREFUL'                                             00070000
//  SCHEDULE STARTBY='02:00',AFTER=WRONG                                00080002
//NOTHING EXEC PGM=IEFBR14                                              00090002
//MYJOB3   JOB (),'CAREFUL'                                             00100000
//  SCHEDULE STARTBY='03:00',AFTER=WRONG                                00110002
//NOTHING EXEC PGM=IEFBR14                                              00120002
//MYJOB4   JOB (),'CAREFUL'                                             00130000
//  SCHEDULE STARTBY='04:00',AFTER=WRONG                                00140002
//NOTHING EXEC PGM=IEFBR14                                              00150002
//MYJOB5   JOB (),'CAREFUL'                                             00160000
//  SCHEDULE STARTBY='05:00',AFTER=WRONG                                00170002
//NOTHING EXEC PGM=IEFBR14                                              00180002
//MYJOB6   JOB (),'CAREFUL'                                             00190000
//  SCHEDULE STARTBY='06:00',AFTER=WRONG                                00200002
//NOTHING EXEC PGM=IEFBR14                                              00210002
//MYJOB7   JOB (),'CAREFUL'                                             00220000
//  SCHEDULE STARTBY='07:00',AFTER=WRONG                                00230002
//NOTHING EXEC PGM=IEFBR14                                              00240000
//MYJOB8   JOB (),'CAREFUL'                                             00250000
//  SCHEDULE STARTBY='08:00',AFTER=WRONG                                00260002
//NOTHING EXEC PGM=IEFBR14                                              00270000
//MYJOB9   JOB (),'CAREFUL'                                             00280000
//  SCHEDULE STARTBY='09:00',AFTER=WRONG                                00290002
//NOTHING EXEC PGM=IEFBR14                                              00300000
//MYJOB10  JOB (),'CAREFUL'                                             00310000
//  SCHEDULE STARTBY='10:00',AFTER=WRONG                                00320002
//NOTHING EXEC PGM=IEFBR14                                              00330000
//MYJOB11  JOB (),'CAREFUL'                                             00340000
//  SCHEDULE STARTBY='11:00',AFTER=WRONG                                00350002
//NOTHING EXEC PGM=IEFBR14                                              00360000
//MYJOB12  JOB (),'CAREFUL'                                             00370000
//  SCHEDULE STARTBY='12:00',AFTER=WRONG                                00380002
//NOTHING EXEC PGM=IEFBR14                                              00390000
//MYJOB13  JOB (),'CAREFUL'                                             00400000
//  SCHEDULE STARTBY='13:00',AFTER=WRONG                                00410002
//NOTHING EXEC PGM=IEFBR14                                              00420000
//MYJOB14  JOB (),'CAREFUL'                                             00430000
//  SCHEDULE STARTBY='14:00',AFTER=WRONG                                00440002
//NOTHING EXEC PGM=IEFBR14                                              00450000
//MYJOB15  JOB (),'CAREFUL'                                             00460000
//  SCHEDULE STARTBY='15:00',AFTER=WRONG                                00470002
//NOTHING EXEC PGM=IEFBR14                                              00480000
//MYJOB16  JOB (),'CAREFUL'                                             00490000
//  SCHEDULE STARTBY='16:00',AFTER=WRONG                                00500002
//NOTHING EXEC PGM=IEFBR14                                              00510000
//MYJOB17  JOB (),'CAREFUL'                                             00520000
//  SCHEDULE STARTBY='17:00',AFTER=WRONG                                00530002
//NOTHING EXEC PGM=IEFBR14                                              00540000
//MYJOB18  JOB (),'CAREFUL'                                             00550000
//  SCHEDULE STARTBY='18:00',AFTER=WRONG                                00560002
//NOTHING EXEC PGM=IEFBR14                                              00570000
//MYJOB19  JOB (),'CAREFUL'                                             00580000
//  SCHEDULE STARTBY='19:00',AFTER=WRONG                                00590002
//NOTHING EXEC PGM=IEFBR14                                              00600000
//MYJOB20  JOB (),'CAREFUL'                                             00610000
//  SCHEDULE STARTBY='20:00',AFTER=WRONG                                00620002
//NOTHING EXEC PGM=IEFBR14                                              00630000
//MYJOB21  JOB (),'CAREFUL'                                             00640000
//  SCHEDULE STARTBY='21:00',AFTER=WRONG                                00650002
//NOTHING EXEC PGM=IEFBR14                                              00660000
//MYJOB22  JOB (),'CAREFUL'                                             00670000
//  SCHEDULE STARTBY='22:00',AFTER=WRONG                                00680002
//NOTHING EXEC PGM=IEFBR14                                              00690000
//MYJOB23  JOB (),'CAREFUL'                                             00700000
//  SCHEDULE STARTBY='23:00',AFTER=WRONG                                00710002
//NOTHING EXEC PGM=IEFBR14                                              00720000

I have been able to reliably reproduce this crash on my Windows system using the latest bleeding edge development version of Hercules, version 4.4.9999.10344-SDL-gf4a8b54b.

HOWEVER... it should be noted that the bug that appears to be causing this crash was introduced into the Hercules code base over 12(!) years ago (by Roger Bowler no less! *`()`**), so the crash should be reproducible by anyone using pretty much any version of Hercules (and any version of z/OS 2.2 or greater of course).

WinDBG !analyze -v analysis of the crash dump produces the following:

Microsoft (R) Windows Debugger Version 6.12.0002.633 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.

Loading Dump File [C:\Users\Fish\Documents\Visual Studio 2008\Projects\Hercules\_DUMPS\Hercules.dmp]
User Mini Dump File: Only registers, stack and portions of memory are available

Comment: 'HHC01413I Hercules version 4.4.9999.10344-SDL-gf4a8b54b (4.4.9999.10344)'
Comment: 'HHC01414I (C) Copyright 1999-2020 by Roger Bowler, Jan Jaeger, and others'
Comment: 'HHC01417I ** The SoftDevLabs version of Hercules **'
Comment: 'HHC01415I Build date: Dec 20 2020 at 18:23:36'
Comment: 'HHC01417I Built with: Microsoft Visual Studio 2008 (MSVC 150030729 1)'
Comment: 'HHC01417I Build type: Windows MSVC AMD64 host architecture build'
Comment: 'HHC01417I Modes: S/370 ESA/390 z/Arch'
Comment: 'HHC01417I Max CPU Engines: 64'
Comment: 'HHC01417I Using   shared libraries'
Comment: 'HHC01417I Using   Fish threads Threading Model'
Comment: 'HHC01417I Using   Error-Checking Mutex Locking Model'
Comment: 'HHC01417I With    Shared Devices support'
Comment: 'HHC01417I With    Dynamic loading support'
Comment: 'HHC01417I With    External GUI support'
Comment: 'HHC01417I With    Partial TCP keepalive support'
Comment: 'HHC01417I With    IPV6 support'
Comment: 'HHC01417I With    HTTP Server support'
Comment: 'HHC01417I With    sqrtl support'
Comment: 'HHC01417I With    Signal handling'
Comment: 'HHC01417I With    Watchdog monitoring'
Comment: 'HHC01417I With    CCKD BZIP2 support'
Comment: 'HHC01417I With    HET BZIP2 support'
Comment: 'HHC01417I With    ZLIB support'
Comment: 'HHC01417I With    Regular Expressions support'
Comment: 'HHC01417I With    Object REXX support'
Comment: 'HHC01417I Without Regina REXX support'
Comment: 'HHC01417I With    Automatic Operator support'
Comment: 'HHC01417I Without National Language Support'
Comment: 'HHC01417I With    CCKD64 Support'
Comment: 'HHC01417I With    Transactional-Execution Facility support'
Comment: 'HHC01417I With    "Optimized" instructions'
Comment: 'HHC01417I Machine dependent assists: cmpxchg1 cmpxchg4 cmpxchg8 cmpxchg16 hatomics=msvcIntrinsics'
Comment: 'HHC01417I Running on: WIN7 (Windows-6.1.7601 Intel(R) x64) LP=8, Cores=8, CPUs=2'
Comment: 'HHC01417I Built with crypto external package version 1.0.0.31-gbf46d03'
Comment: 'HHC01417I Built with decNumber external package version 3.68.0.84-g94144c9'
Comment: 'HHC01417I Built with SoftFloat external package version 3.5.0.87-gb706677'
Comment: 'HHC01417I Built with telnet external package version 1.0.0.48-gb90a708'
Comment: 'HHC00018I Hercules is running in elevated mode'
Comment: '18:45:47.056169 0:0A82 cckd_read_track(859): read  trk   78570 (asynchronous)'
Comment: '18:45:47.056175 0:0A82 cckd64_read_trk(947): 0 rdtrk     78570'
Comment: '18:45:47.056187 0:0A82 cckd64_read_trk(995): 0 rdtrk[85] 78570 cache hit buf 000000000A39E180:0014760000'
Comment: '18:45:47.056223 0:0A82 cckd_update_track(939): updt  trk   78570 offset 797 length 15596'
Comment: '18:45:47.056242 0:0A82 cckd_read_track(859): read  trk   78578 (asynchronous)'
Comment: '18:45:47.056245 0:0A82 cckd64_read_trk(947): 0 rdtrk     78578'
Comment: '18:45:47.056249 0:0A82 cckd64_read_trk(995): 0 rdtrk[82] 78578 cache hit buf 000000000A0E8590:0014760008'
Comment: '18:45:47.056270 0:0A82 cckd_update_track(939): updt  trk   78578 offset 4133 length 4096'
Comment: '18:45:47.056289 0:0A82 cckd_read_track(859): read  trk   78608 (asynchronous)'
Comment: '18:45:47.056291 0:0A82 cckd64_read_trk(947): 0 rdtrk     78608'
Comment: '18:45:47.056295 0:0A82 cckd64_read_trk(995): 0 rdtrk[112] 78608 cache hit buf 0000000009C9BAC0:0014780008'
Comment: '18:45:47.056316 0:0A82 cckd_update_track(939): updt  trk   78608 offset 32861 length 4096'
Comment: '18:45:47.056334 0:0A82 cckd_read_track(859): read  trk   78700 (asynchronous)'
Comment: '18:45:47.056336 0:0A82 cckd64_read_trk(947): 0 rdtrk     78700'
Comment: '18:45:47.056340 0:0A82 cckd64_read_trk(995): 0 rdtrk[204] 78700 cache hit buf 000000000A3D59C0:00147e000a'
Comment: '18:45:47.056360 0:0A82 cckd_update_track(939): updt  trk   78700 offset 12341 length 4096'
Comment: '18:45:47.056379 0:0A82 cckd_read_track(859): read  trk   78570 (asynchronous)'
Comment: '18:45:47.056382 0:0A82 cckd64_read_trk(947): 0 rdtrk     78570'
Comment: '18:45:47.056386 0:0A82 cckd64_read_trk(995): 0 rdtrk[85] 78570 cache hit buf 000000000A39E180:0014760000'
Comment: '18:45:47.056405 0:0A82 cckd_update_track(939): updt  trk   78570 offset 29 length 380'
Comment: '18:45:47.056439 0:0A82 cckd_update_track(939): updt  trk   78570 offset 417 length 372'
Comment: '18:45:47.056443 0:0A82 cckd64_dasd_end(344): end i/o bufcur 78570 cache[85] waiters 0'
Comment: '18:45:47.057415 0:0A82 cckd64_dasd_start(266): start i/o file[4] bufcur 78570 cache[85]'
Comment: '18:45:47.057459 0:0A82 cckd_read_track(859): read  trk   2433 (asynchronous)'
Comment: '18:45:47.057465 0:0A82 cckd64_read_trk(947): 0 rdtrk     2433'
Comment: '18:45:47.057480 0:0A82 cckd64_read_trk(995): 0 rdtrk[194] 2433 cache hit buf 000000000A33CF10:0000a20003'
Comment: '18:45:47.057490 0:0A82 cckd64_dasd_end(344): end i/o bufcur 2433 cache[194] waiters 0'
Comment: '18:45:47.062747 0:0A82 cckd64_dasd_start(266): start i/o file[4] bufcur 2433 cache[194]'
Comment: '18:45:47.062804 0:0A82 cckd64_dasd_end(344): end i/o bufcur 2433 cache[194] waiters 0'
Comment: '18:45:47.066350 0:0A82 cckd64_dasd_start(266): start i/o file[4] bufcur 2433 cache[194]'
Comment: '18:45:47.066404 0:0A82 cckd_update_track(939): updt  trk   2433 offset 27077 length 3856'
Comment: '18:45:47.066408 0:0A82 cckd64_dasd_end(344): end i/o bufcur 2433 cache[194] waiters 0'
Comment: '18:45:47.069771 0:0A82 cckd64_dasd_start(266): start i/o file[4] bufcur 2433 cache[194]'
Comment: '18:45:47.069810 0:0A82 cckd_read_track(859): read  trk   2439 (asynchronous)'
Comment: '18:45:47.069813 0:0A82 cckd64_read_trk(947): 0 rdtrk     2439'
Comment: '18:45:47.069817 0:0A82 cckd64_read_trk(1026): 0 rdtrk[169] 2439 cache miss'
Comment: '18:45:47.069820 0:0A82 cckd64_read_trk(1055): 0 rdtrk[169] 2439 dropping 0A80:18397 from cache'
Comment: '18:45:47.069824 0:0A82 cckd64_read_trk(1078): 0 rdtrk[169] 2439 buf 000000000A0CC970 len 56832'
Comment: '18:45:47.069826 0:0A82 cckd64_read_trkimg(2658): trk[2439] read_trkimg'
Comment: '18:45:47.069830 0:0A82 cckd64_read_l2ent(2575): file[4] l2[9,135] trk[2439] read_l2ent 0x36751'
Comment: '18:45:47.069833 0:0A82 cckd64_read_l2(2342): file[4] read_l2 9 active 3 9 849'
Comment: '18:45:47.069836 0:0A82 cckd64_read_l2(2361): l2[4,9] cache[1029] hit'
Comment: '18:45:47.069839 0:0A82 cckd64_read_l2ent(2575): file[3] l2[9,135] trk[2439] read_l2ent 0x7534'
Comment: '18:45:47.069842 0:0A82 cckd64_read_l2(2342): file[3] read_l2 9 active 4 9 1029'
Comment: '18:45:47.069845 0:0A82 cckd64_read_l2(2361): l2[3,9] cache[849] hit'
Comment: '18:45:47.069848 0:0A82 cckd64_read_l2ent(2593): file[3] l2[9,135] trk[2439] read_l2ent 0x15a1abf 3601 3601'
Comment: '18:45:47.069852 0:0A82 cckd64_read(456): file[3] fd[18] read, off 0x00000000015a1abf len 3601'
Comment: '18:45:47.069871 0:0A82 cckd64_read_trk(1119): 0 rdtrk[169] 2439 complete buf 000000000A0CC970:0100a20009'
Comment: '18:45:47.069874 0:0A82 cckd_uncompress(5624): uncompress comp 1 len 3601 maxlen 56832 trk 2439'
Comment: '18:45:47.070211 0:0A82 cckd_uncompress_zlib(5741): uncompress zlib newlen 46397 rc 0'
Comment: '18:45:47.070215 0:0A82 cckd_validate(3768): validating trk 2439 len 46397 0000a20009 00a2000900000008'
Comment: '18:45:47.070218 0:0A82 cckd_read_track(850): read  trk   2439 uncompressed len 46397'
Comment: '18:45:47.070223 0:0A82 cckd64_dasd_end(344): end i/o bufcur 2439 cache[169] waiters 0'
Comment: '18:45:47.071012 0:0A82 cckd64_dasd_start(266): start i/o file[4] bufcur 2439 cache[169]'
Comment: '18:45:47.071067 0:0A82 cckd_update_track(939): updt  trk   2439 offset 13 length 8'
Comment: '18:45:47.071071 0:0A82 cckd64_dasd_end(344): end i/o bufcur 2439 cache[169] waiters 0'
Comment: '18:45:47.071909 0:0A82 cckd64_dasd_start(266): start i/o file[4] bufcur 2439 cache[169]'
Comment: '18:45:47.071967 0:0A82 cckd_update_track(939): updt  trk   2439 offset 3893 length 3856'
Comment: '18:45:47.071971 0:0A82 cckd64_dasd_end(344): end i/o bufcur 2439 cache[169] waiters 0'
Comment: '18:45:47.072233 0:0A82 cckd64_dasd_start(266): start i/o file[4] bufcur 2439 cache[169]'
Comment: '18:45:47.072302 0:0A82 cckd_update_track(939): updt  trk   2439 offset 29 length 3856'
Comment: '18:45:47.072308 0:0A82 cckd64_dasd_end(344): end i/o bufcur 2439 cache[169] waiters 0'
Comment: '18:45:47.072397 0:0A83 cckd64_dasd_start(266): start i/o file[4] bufcur 274 cache[36]'
Comment: '18:45:47.072407 0:0A83 cckd64_dasd_end(344): end i/o bufcur 274 cache[36] waiters 0'
Comment: ''
Symbol search path is: C:\Users\Fish\Documents\Visual Studio 2008\Projects\Hercules\_DUMPS;srv*C:\Windows\Symbols*http://msdl.microsoft.com/download/symbols
Executable search path is: C:\Users\Fish\Documents\Visual Studio 2008\Projects\Hercules\_DUMPS
Windows 7 Version 7601 (Service Pack 1) MP (8 procs) Free x64
Product: WinNt, suite: SingleUserTS
Machine Name:
Debug session time: Sun Dec 20 18:45:53.000 2020 (UTC - 8:00)
System Uptime: not available
Process Uptime: 0 days 0:08:48.000
................................................................
...................................
Loading unloaded module list
...............
This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(3a10.2a40): Access violation - code c0000005 (first/second chance not available)
ntdll!NtGetContextThread+0xa:
00000000`775b99ea c3              ret
0:010> !analyze -v
*******************************************************************************
*                                                                             *
*                        Exception Analysis                                   *
*                                                                             *
*******************************************************************************

Failed calling InternetOpenUrl, GLE=12029

FAULTING_IP: 
hengine!z900_maddr_l+5a [c:\users\fish\documents\visual studio 2008\projects\hercules\_git\_fish\hyperion-1\dat.h @ 81]
00000001`0044d5fa 493984c900040000 cmp     qword ptr [r9+rcx*8+400h],rax

EXCEPTION_RECORD:  ffffffffffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 000000010044d5fa (hengine!z900_maddr_l+0x000000000000005a)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 0000000000000000
   Parameter[1]: 0000000008f36c80
Attempt to read from address 0000000008f36c80

DEFAULT_BUCKET_ID:  INVALID_POINTER_READ

PROCESS_NAME:  hercules.exe

ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.

EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.

EXCEPTION_PARAMETER1:  0000000000000000

EXCEPTION_PARAMETER2:  0000000008f36c80

READ_ADDRESS:  0000000008f36c80 

FOLLOWUP_IP: 
hengine!z900_maddr_l+5a [c:\users\fish\documents\visual studio 2008\projects\hercules\_git\_fish\hyperion-1\dat.h @ 81]
00000001`0044d5fa 493984c900040000 cmp     qword ptr [r9+rcx*8+400h],rax

MOD_LIST: <ANALYSIS/>

NTGLOBALFLAG:  0

APPLICATION_VERIFIER_FLAGS:  0

FAULTING_THREAD:  0000000000002a40

PRIMARY_PROBLEM_CLASS:  INVALID_POINTER_READ

BUGCHECK_STR:  APPLICATION_FAULT_INVALID_POINTER_READ

LAST_CONTROL_TRANSFER:  from 000000010044e16f to 000000010044d5fa

STACK_TEXT:  
00000000`06c0f5e0 00000001`0044e16f : 00000000`06c11000 00000000`13d29469 00000000`0002da80 00000000`00000005 : hengine!z900_maddr_l+0x5a [c:\users\fish\documents\visual studio 2008\projects\hercules\_git\_fish\hyperion-1\dat.h @ 81]
00000000`06c0f620 00000001`004520e9 : 00000000`06c11000 00000000`00000012 00000000`00000000 00000000`00000010 : hengine!z900_move_charx+0xdf [c:\users\fish\documents\visual studio 2008\projects\hercules\_git\_fish\hyperion-1\vstore.h @ 1038]
00000000`06c0f680 00000001`004680de : 00000000`06c1104c 00000000`00000008 00000001`00932aa0 00000000`06c11000 : hengine!z900_move_with_optional_specifications+0x289 [c:\users\fish\documents\visual studio 2008\projects\hercules\_git\_fish\hyperion-1\control.c @ 2824]
00000000`06c0f6e0 00000001`00468857 : 00000000`00002a40 00000000`00000000 00000001`00727b50 00000000`00000004 : hengine!z900_run_cpu+0xe7e [c:\users\fish\documents\visual studio 2008\projects\hercules\_git\_fish\hyperion-1\cpu.c @ 2048]
00000000`06c0f840 00000000`00290450 : 00000000`0055a740 00000000`0055a740 00000000`0055a720 00000000`7113334a : hengine!cpu_thread+0x297 [c:\users\fish\documents\visual studio 2008\projects\hercules\_git\_fish\hyperion-1\cpu.c @ 2329]
00000000`06c0f8d0 00000000`00283c04 : 00000000`00571ee0 00000000`00000000 00000000`00000000 00000000`00000000 : hutil!hthread_func+0x60 [c:\users\fish\documents\visual studio 2008\projects\hercules\_git\_fish\hyperion-1\hthreads.c @ 1051]
00000000`06c0f900 00000000`71132fdf : 00000000`005a0770 00000000`00000000 00000000`00000000 00000000`00000000 : hutil!FTWin32ThreadFunc+0x44 [c:\users\fish\documents\visual studio 2008\projects\hercules\_git\_fish\hyperion-1\fthreads.c @ 809]
00000000`06c0f950 00000000`71133080 : 00000000`711bcad0 00000000`005a0770 00000000`00000000 00000000`00000000 : msvcr90!endthreadex+0x47
00000000`06c0f980 00000000`7744558d : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : msvcr90!endthreadex+0xe8
00000000`06c0f9b0 00000000`775a2b1d : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : kernel32!BaseThreadInitThunk+0xd
00000000`06c0f9e0 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x1d

FAULTING_SOURCE_CODE:  
    77:         /* Or if not, is address in a common segment? */
    78:         if (0
    79:             || (regs->CR( aea_arn ) == regs->tlb.TLB_ASD( tlbix ))
    80:             || (regs->AEA_COMMON( aea_arn ) & regs->tlb.common[ tlbix ])
>   81:         )
    82:         {
    83:             /* Storage Key zero? */
    84:             /* Or if not, same Storage Key as before? */
    85:             if (0
    86:                 || akey == 0

SYMBOL_STACK_INDEX:  0

SYMBOL_NAME:  hengine!z900_maddr_l+5a

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: hengine

IMAGE_NAME:  hengine.dll

DEBUG_FLR_IMAGE_TIMESTAMP:  5fe00748

STACK_COMMAND:  ~10s; .ecxr ; kb

FAILURE_BUCKET_ID:  INVALID_POINTER_READ_c0000005_hengine.dll!z900_maddr_l

BUCKET_ID:  X64_APPLICATION_FAULT_INVALID_POINTER_READ_hengine!z900_maddr_l+5a

WATSON_STAGEONE_URL:  http://watson.microsoft.com/StageOne/hercules_exe/4_4_9999_10344/5fe00763/hengine_dll/4_4_9999_10344/5fe00748/c0000005/0004d5fa.htm?Retriage=1

Followup: MachineOwner
---------

As you can see, Hercules is crashing in the dat.h function ARCH_DEP( maddr_l ) (commonly invoked by the MADDR and MADDRL macros) in its very first test for a TLB Hit.

The reason it is crashing is because the value of arn passed into the function has the value USE_ARMODE | b1 (i.e. the value 30 in this particular case: USE_ARMODE (which is #defined to 16) "OR"ed with the operand access register number (which in this particular case was 14)).

This causes the ARCH_DEP( maddr_l ) statement:

    int  aea_arn  = regs->AEA_AR( arn );

to load a garbage aea_arn value due to the aea_ar_struct array subscript (5 + arn) used by the AEA_AR macro being way out of bounds, which of course ends up causing the crash the moment the first TLB Hit condition is reached and regs->CR( aea_arn ) and/or regs->AEA_COMMON( aea_arn ) are evaluated.


*`()`** The following two commits introduced the bug:

Revision: ed4d97402623265b30d9f15db1b44fcb5e6e1d91: Author: Roger Bowler Date: 3/15/2008 5:04:37 PM Message: Replace ACC_ARMODE by USE_ARMODE for LPTEA Modified: dat.h Modified: esame.c Modified: hconsts.h

Revision: f440c1bcf72faafca18d82055f3c77e6950abfca Author: Roger Bowler Date: 3/15/2008 5:09:57 PM Message: Add MVCOS instruction (part 2) Modified: control.c Modified: vstore.h


This issue is closely related to Issue #350.

Fish-Git commented 3 years ago

Inserting the following at the beginning of the function after the variable declarations seems to fix the problem:

    if (arn >= USE_ARMODE)
    {
        arn &= 0xF;
        aea_arn = regs->AEA_AR( arn );
    }

BUT... I'm not sure yet whether that's the correct thing to do or not.

More research is needed.

If anyone else out there has any additional thoughts to share on the subject, I'd be very eager to hear them!

Thanks.

Fish-Git commented 3 years ago

Fixed by commit 7c686d5c59b671b2542a5280f24248b3e7c891b7.

Closing.

wably commented 3 years ago

Hey Fish (@Fish-Git ),

Sorry I had to reopen this issue. The fix that you made to correct the DAT bug in MVCOS in AR mode still isn't right.

I've been working on my test cases for issue #350 and I started running into problems with the AR mode use of MVCOS. Now, you would be right to first question my test case programming. But I have an ace in the hole. I have access to real iron (a z114). My test case works perfectly on the z114 (in a z/VM 6.4 virtual machine), but fails with the latest SDL Hyperion, just cloned and compiled in the last hour. I'm using SATK to assemble the same exact source that I assemble on the z114.

The failing situation is that the PSW address space control is set to Home, and the MVCOS R0 control bits are set to X'00000041' which means the 1st operand uses the Home mode from the PSW, and the 2nd operand uses AR mode. The access register used in the 2nd operand is AR6, and the ALET=0 (use primary space). On Hercules, the MVCOS moves a literal from the Home space into the operand1 location in the home space (bad). On the z114, MVCOS moves a literal from the Primary space (as per AR6) to the operand1 location in the home space (good).

So this is just a heads up. I will be providing full documentation here shortly. I need to distill it down to the essential failing parts so you can duplicate it.

Regards, Bob

Fish-Git commented 3 years ago

EXCELLENT!!

Not that Hercules contains a bug of course, but rather that your test has uncovered it! (which is precisely why I wanted a test case written)

Please keep me informed!

(and THANK YOU, Bob, for your efforts!)

wably commented 3 years ago

Ok here is the documentation. Start with the 'readme.pdf' inside the zip file. I've also enclosed the test run from the z114 so you can compare results, as well as two runs on Hercules (one with tracing active). I'm happy to answer any questions, of course.

mvcos.zip

Regards, Bob

Fish-Git commented 3 years ago

Quick note to let you know I'm not ignoring you, Bob. I just haven't had time yet to properly dig into this issue. Your patience and understanding is appreciated. (and again, I really do appreciate your effort thus far!!)

wably commented 3 years ago

Fish, not a problem. Take your time. Because of the z114, I have been able to continue working on the MVCOS tests.

Bob

Fish-Git commented 3 years ago

Bob (@wably),

Please try the attached patch:

(inline):

From: "Fish" <fish@softdevlabs.com>
Date: Sun, 24 Jan 2021 16:01:28 -0800
Subject: [PATCH] Fix forced AR mode crash in DAT TLB lookup (try #2)

Hopefully this closes #349 for good this time, and helps #350 effort too!

diff -r -a -x .git -x 'msvc.AMD64.*' -x 'msvc.dllmod.*' -x 'msvc.debug.*' -x '*.suo' -x '*.ncb' -x '*.user' -x '*.htm' -x WORK -x DICTS -x FILES -x 'allTests.*' -x '*.rej' -x '*.orig' -x AutoBuildCount.h -x '*.cmp*' -x '*.comp*' -Nu hyperion-1/dat.c hyperion-0/dat.c
--- hyperion-1/dat.c    2020-12-24 14:37:41.457392400 -0800
+++ hyperion-0/dat.c    2021-01-24 13:20:23.023483300 -0800
@@ -541,10 +541,11 @@
 /* Input:                                                            */
 /*      arn     Access register number (0-15) to be used if the      */
 /*              address-space control (PSW bits 16-17) indicates     */
-/*              that ARMODE is the current translation mode.         */
+/*              that AR-mode is the current translation mode.        */
 /*              An access register number ORed with the special      */
-/*              value USE_ARMODE forces this routine to use ARMODE   */
-/*              regardless of the PSW address-space control setting. */
+/*              value USE_ARMODE forces this routine to use AR-mode  */
+/*              address translation regardless of the PSW address-   */
+/*              space control setting.                               */
 /*              Access register 0 is treated as if it contained 0    */
 /*              and its actual contents are not examined.            */
 /*              Alternatively the arn parameter may contain one      */
diff -r -a -x .git -x 'msvc.AMD64.*' -x 'msvc.dllmod.*' -x 'msvc.debug.*' -x '*.suo' -x '*.ncb' -x '*.user' -x '*.htm' -x WORK -x DICTS -x FILES -x 'allTests.*' -x '*.rej' -x '*.orig' -x AutoBuildCount.h -x '*.cmp*' -x '*.comp*' -Nu hyperion-1/dat.h hyperion-0/dat.h
--- hyperion-1/dat.h    2020-12-24 14:38:22.000000000 -0800
+++ hyperion-0/dat.h    2021-01-24 14:40:27.535965500 -0800
@@ -48,8 +48,9 @@
 /*                 USE_HOME_SPACE                                    */
 /*                 USE_ARMODE + access register number               */
 /*              An access register number ORed with the special      */
-/*              value USE_ARMODE forces this routine to use ARMODE   */
-/*              regardless of the PSW address-space control setting. */
+/*              value USE_ARMODE forces this routine to use AR-mode  */
+/*              address translation regardless of the PSW address-   */
+/*              space control setting.                               */
 /*      regs    Pointer to the CPU register context                  */
 /*      acctype Type of access requested: READ, WRITE, INSTFETCH,    */
 /*              LRA, IVSK, TPROT, STACK, PTE, LPTEA                  */
@@ -74,18 +75,47 @@
        (which is many, many instructions)
     */

-    int  aea_arn  = regs->AEA_AR( arn >= USE_ARMODE ? arn & 0xF : arn );
+    int  aea_crn;
     U16  tlbix    = TLBIX( addr );
-    BYTE *maddr = NULL;
+    BYTE *maddr   = NULL;

-    /* Non-zero AEA Access Register number? */
-    if (aea_arn)
+    /* Forced AR-mode address translation? */
+    if (arn >= USE_ARMODE)
+    {
+        /* Retrieve ALET from AR */
+        U32 alet = (ARN( arn ) == 0) ? 0 :
+                   /* Guest ALET if XC guest in AR mode */
+                   (SIE_ACTIVE( regs ) && MULTIPLE_CONTROLLED_DATA_SPACE( GUESTREGS ))
+                   ? GUESTREGS->AR( ARN( arn )) :
+                   /* If SIE host but not XC guest in AR mode then alet is 0 */
+                   SIE_ACTIVE( regs ) ? 0 :
+                   /* Otherwise alet is in the access register */
+                   regs->AR( ARN( arn ));
+
+LOGMSG("    +++ USE_ARMODE! +++\n");
+        /* Convert ALET to corresponding CR number, if possible */
+        switch (alet)
+        {
+            case ALET_PRIMARY:   aea_crn =  1; break;
+            case ALET_SECONDARY: aea_crn =  7; break;
+            case ALET_HOME:      aea_crn = 13; break;
+            default:             aea_crn =  0; break;
+        }
+    }
+    else
+    {
+        /* Normal PSW-based address translation */
+        aea_crn = regs->AEA_AR( arn );
+    }
+
+    /* Non-zero AEA Control Register number? */
+    if (aea_crn)
     {
         /* Same Addess Space Designator as before? */
         /* Or if not, is address in a common segment? */
         if (0
-            || (regs->CR( aea_arn ) == regs->tlb.TLB_ASD( tlbix ))
-            || (regs->AEA_COMMON( aea_arn ) & regs->tlb.common[ tlbix ])
+            || (regs->CR( aea_crn ) == regs->tlb.TLB_ASD( tlbix ))
+            || (regs->AEA_COMMON( aea_crn ) & regs->tlb.common[ tlbix ])
         )
         {
             /* Storage Key zero? */
@@ -113,6 +143,9 @@
                         if (acctype & ACC_CHECK)
                             regs->dat.storkey = regs->tlb.storkey[ tlbix ];

+if (arn >= USE_ARMODE)
+LOGMSG("    +++ USE_ARMODE TLB HIT! +++\n");
+
                         maddr = MAINADDR(regs->tlb.main[tlbix], addr);
                     }
                 }
diff -r -a -x .git -x 'msvc.AMD64.*' -x 'msvc.dllmod.*' -x 'msvc.debug.*' -x '*.suo' -x '*.ncb' -x '*.user' -x '*.htm' -x WORK -x DICTS -x FILES -x 'allTests.*' -x '*.rej' -x '*.orig' -x AutoBuildCount.h -x '*.cmp*' -x '*.comp*' -Nu hyperion-1/esame.c hyperion-0/esame.c
--- hyperion-1/esame.c  2021-01-20 19:25:06.000000000 -0800
+++ hyperion-0/esame.c  2021-01-24 13:20:18.093874700 -0800
@@ -1214,7 +1214,7 @@
         n = USE_HOME_SPACE;
         break;
     case 4: /* Use current addressing mode (PSW bits 16-17) */
-        n = r2; /* r2 is the access register number if ARMODE */
+        n = r2; /* r2 is the access register number if AR-mode */
         break;
     default: /* Specification exception if invalid value for m4 */
         n = -1; /* makes compiler happy */

  It fixes (hopefully correctly!(*)) your provided failing test case.

Thanks.


*`()`** Please take a look at the patch and tell me whether or not you feel I am doing things correctly. I think I am, but am not sure. In any case, you provided test case now passes (completes successfully).

Note: I have not looked into the incorrect instruction tracing operand storage display issue yet. That's next!

p.s. Ignore the two debugging LOGMSGs. I added them to try and see how often forced AR mode translation was being requested/used by e.g. z/OS in normal circumstances (i.e. during an IPL and z/OSMF startup), and interestingly neither message appeared even a single time. But then I did not try the test case documented in this GitHib Issue yet either, which probably would cause both messages to be issued. How many times, I'm not sure. But suffice to say the specific condition (the specific bug) that my patch fixes is apparently extremely rare.

Fish-Git commented 3 years ago

But then I did not try the test case documented in this GitHib Issue yet either, which probably would cause both messages to be issued. How many times, I'm not sure.

Okay, I just got around to trying it with z/OS 2.3 and the above "//MYJOB ... SCHEDULE STARTBY..." test case, and the "USE_ARMODE!" message appeared 44 times, and the "USEARMODE TLB HIT!" message appeared 22 times (and the originally reported crash did not occur either of course!)_, so I think my patch is probably good.

I'm tempted to commit it as-is (but without the LOGMSGs of course!), but I'd feel better waiting for your approval/blessing/okay before doing so.

Thanks.

wably commented 3 years ago

Fish,

I applied your patch this morning on a fresh clone of the repo, and it passes all my tests (196 of them, at present). The results now match that of the z114.

Nice job, thanks! Please feel free to commit your patch at your convenience.

Regards, Bob

Fish-Git commented 3 years ago

Fixed (AGAIN! Hopefully for good this time!) by commit e74bec604ee3a46dffe317a4543a49fb216dcc73.