SDL-Hercules-390 / hyperion

The SDL Hercules 4.x Hyperion version of the System/370, ESA/390, and z/Architecture Emulator
Other
248 stars 92 forks source link

z/VM 7.2 IPL'ing as guest of itself CCW Command Rejects Aaron says "quick fix" #572

Closed zVMJedi closed 1 year ago

zVMJedi commented 1 year ago

Hello, Charles here.

Aaron said to open an Issue, so here it is.

He pretty well lays it out in the message thread about this issue, one last lousy little bit in byte 6 of the z/VM RDCBK Real Device Characteristics control block isn't being set correctly to handle PFX (CCW opcode X'E7'): byte 6 needs to be set with 'D2' instead of the 'D0' it contains otherwise.

And I'm running Hyperion 4.5 on a Windows Server 2008 R2 Host, so no tricky Linux builds.

Thank you, and there is no urgency about this; I can stuff those Byte 6's with 'D2' from an EXEC for all my CP OWNED DASD. Maybe just slipstream it into 4.6 along with other fixes. I'm about to pull 4.6 and get it going.

Regards,

Charles Perkins

Fish-Git commented 1 year ago

byte 6 needs to be set with 'D2' instead of the 'D0' it contains otherwise.

I doubt it.

I suspect he just worded things sloppily, but what he actually meant to say was: "the X'02' bit needs to be turned on in byte 6."

Byte 6 of the z/VM RDCBK is defined as:

    *** Bytes defined for CKD/ECKD DASD RDC Features field
0006    6 Bitstring    1 RDCVFAC        Program-visible facilities
         .... 1...      RDCSSCSP       X'08' RDCSSCSP Set System
                                       Characteristics is supported
         .... .1..      RDCSSCRC       X'04' RDCSSCRC Set System
                                       Characteristics has been received
                                       for this Path Group
         .... ..1.      RDCPRFX        X'02' RDCPRFX Prefix CCW
                                       supported & enabled

If you just blindly change byte 6 to the hard coded value X'D2', you might end up turning some of its bits on that were previously off and vice-versa.

For example:  if the the value in byte 6 happens to be X'DC', then you need to change it to X'DE', not X'D2'!

You need to be very careful how you write your EXEC, Charles.

Fish-Git commented 1 year ago
  1. What cu=xxxx control unit option value are you using on your Hercules device statements? I always use cu=3990-6 on all of mine.

  2. What does your Directory entry for your second level z/VM guest look like? I'd like to try and recreate this issue for myself, and could really use a jump-start! Thanks.

zVMJedi commented 1 year ago

I've always used the generic cu=3990 . I'll add the -6 and see what happens.

Here's my 2nd Level UserID:

Kind regards, Charles

Fish-Git commented 1 year ago

General FYI regarding email replies:

I would very much appreciate it if you would not respond/reply to GitHub Issues via email.

I would much prefer that you instead respond/reply directly via the GitHub Issues web page itself:

When you reply directly via their web page, I can make minor edits to your reply so it is more readable (prettier) by editing the fonts being used, formatting of log messages, etc.

When you reply via email however, I cannot edit your reply (GitHub does not allow it), so oftentimes it is much harder (more difficult) to read.

GitHub also does not allow attachments in their email replies either, making it impossible to receive a file that may have been requested from you.

It is up to you whether or not you want to take the time to reply via their web page or continue to reply via email, but it is generally preferable that you reply directly via their web page instead. Especially if you need to attach a file that was requested from you.

Thanks for understanding.

zVMJedi commented 1 year ago

I'll gladly comply by responding only here. I'd never used GitHub except as an observer and couldn't figure out how to specify a fixed font, I think I've found that now. I used a fixed font in my e-mail reply and it appears Git lost it somehow.

arfineman commented 1 year ago

Charles,

Fish is correct. D2 was based on your specific display results. As I indicated in my final comment, when PFX command was added, byte 6, bit 6 should have been turned on. Just as it is indicated in the RDCBK that Fish posted.

If you write a Rexx exec to turn the bit on, use the BITOR function in Rexx.

There are other ways around it, but its not worth pursuing. Turning this bit on will add support for PFX command for guest running under zVM, both for minidisks and dedicated devices.

Best regards,

Fish-Git commented 1 year ago

I'd never used GitHub except as an observer and couldn't figure out how to specify a fixed font, I think I've found that now.

(I personally prefer the PDF)

I used a fixed font in my e-mail reply and it appears Git lost it somehow.

As I explained, GitHub does not allow formatting of emails. That's why I prefer that people reply to GitHub Issues directly via their web interface instead. When you post your comment/reply via their web page, you can use markdown to format your comment/reply however you like. Depending on what you are posting, this can make a HUGE difference in readability.

Fish-Git commented 1 year ago

Fish is correct.

If you write a Rexx exec to turn the bit on, use the BITOR function in Rexx.

Thanks, Aaron.

Fish-Git commented 1 year ago

Charles,

Since you're the zVM Jedi (cool nickname btw!), maybe you can help me.

I'm trying to recreate your problem by IPLing a second level z/VM 7.2 under my existing z/VM 7.2, but things aren't working out too well.   :(

(It's been YEARS since I've messed with such things!)

Each time I do my IPL, it displays the ipl/startup messages on my CMS userid's console, and invariably gets to a point where it says I need to reply to something. Only problem is, I don't know how to reply!

PLEASE NOTE: I know what I want to reply, but I can't seem to figure out how to reply! Just entering the reply and pressing enter accomplishes nothing.

I don't know if my terminal conmode is wrong, or if I should dial into my userid before IPLing, or something else.

I seem to recall I might need a different LINEND setting or something? (I forget! It's been too long since I've done this shit!)

HELP!   :(

p.s. I'm not seeing any type of I/O errors so far, but I'm not convinced I've gotten far enough into the second level IPL to reach that point yet.

Fish-Git commented 1 year ago

p.p.s. Here's the directory entry I've defined, and the exec I'm using to IPL my second level system with:

******************************************
*         z/VM 7.2 -- second level       *
******************************************

USER ZVM72 ZVM72 6G 6G BCEFGH

IPL ZCMS
MACHINE Z 4
OPTION DEVINFO DEVMAINT MAINTCCW LNKS LNKE LNKNOPAS

CONSOLE 009 3215

SPOOL 00C 2540 READER *
SPOOL 00E 1403 A

MDISK 191 3390 6351 10 M01RES MR
LINK MAINT 190 190 RR
LINK MAINT 19D 19D RR
LINK MAINT 19E 19E RR

DEDICATE 0223 0123
DEDICATE 0224 0124
DEDICATE 0225 0125
DEDICATE 0226 0126
DEDICATE 0227 0127
DEDICATE 0228 0128                                  

Here's the exec I use to IPL with:

/*-------------------------------------------------------------------*/
/*                         z/VM 7.2                                  */
/*-------------------------------------------------------------------*/

Call InitDasd

'CP TERMINAL CONMODE 3270'
"PIPE CP IPL 123 | STEM cpmsg."

/* If we reach here then the IPL command obviously failed! */
'CP TERMINAL CONMODE 3215'
say 'ERROR: IPL failed! rc='rc': "'cpmsg.1'"'

Return

/*-------------------------------------------------------------------*/
/*                         z/VM 7.2                                  */
/*-------------------------------------------------------------------*/

InitDasd:

'CP DETach Virtual 0123-0128'

'CP ATTach 0223 * 0123'
'CP ATTach 0224 * 0124'
'CP ATTach 0225 * 0125'
'CP ATTach 0226 * 0126'
'CP ATTach 0227 * 0127'
'CP ATTach 0228 * 0128'

Return
arfineman commented 1 year ago

Hi Fish, Try your reply with #CP VI VMSG xxxxxx where xxxxxx is your reply text. Best regards,

zVMJedi commented 1 year ago

You seem to have it all correct. Once logged on to your 2nd level ID, and you invoke that startup EXEC, the next thing you should see is the z/VM "startup" messages, then the prompt for what sort of start WARM, FORCE, COLD, plus the other PARMS you'd probably want to say FORCE since I don't know what shape your 2nd level would be in.

You should be able to enter something on the command line such as FORCE DRAIN NOAUTOLOG, hit "Enter"

then it will ask for TOD, just hit "Enter" to that, and then it will proceed from there.

If it isn't responding to what you reply, and then "Enter" I'm not sure what your problem might be. At that point your virtual console is "talking" to your 2nd-level system, the different LINEND is if you need to pass something to 1st-Level CP . Since you're DEDICATE'ing the DASD in the CP Dir you don't need the ATTACH'es. 3270 is the correct CONMODE setting.

I use Tom Brennan's Vista3270 emulator product in Windows. If you're using some X-Terminal 3270 emulator, I've never stepped into that world. I can't think of any other show-stopper but I've just awakened. You are getting to here, right? :

07:14:11 z/VM  V7 R2.0  SERVICE LEVEL 2001 (64-BIT)
07:14:11 SYSTEM NUCLEUS CREATED ON 2020-07-29 AT 16:50:40, LOADED FROM M01RES
07:14:11
07:14:11 ****************************************************************
07:14:11 * LICENSED MATERIALS - PROPERTY OF IBM*                        *
07:14:11 *                                                              *
07:14:11 * 5741-A09 (C) COPYRIGHT IBM CORP. 1983, 2020. ALL RIGHTS      *
07:14:11 * RESERVED. US GOVERNMENT USERS RESTRICTED RIGHTS - USE,       *
07:14:11 * DUPLICATION OR DISCLOSURE RESTRICTED BY GSA ADP SCHEDULE     *
07:14:11 * CONTRACT WITH IBM CORP.                                      *
07:14:11 *                                                              *
07:14:11 * * TRADEMARK OF INTERNATIONAL BUSINESS MACHINES.              *
07:14:11 ****************************************************************
07:14:11
07:14:11 HCPZCO6718I Using parm disk 1 on volume VMCOM1 (device 0700).
07:14:11 HCPZCO6718I Parm disk resides on cylinders 1 through 120.
07:14:12
07:14:12 HCPIIS954I DASD 1725 VOLID M01P01 IS A DUPLICATE OF DASD 0705
07:14:12 HCPIIS954I DASD 1723 VOLID M01RES IS A DUPLICATE OF DASD 0703
07:14:12 HCPIIS954I DASD 1724 VOLID M01S01 IS A DUPLICATE OF DASD 0704
07:14:12 HCPIIS954I DASD 0721 VOLID 720RL1 IS A DUPLICATE OF DASD 0701
07:14:12 HCPIIS954I DASD 0722 VOLID 720RL2 IS A DUPLICATE OF DASD 0702
07:14:12 HCPIIS954I DASD 0720 VOLID VMCOM1 IS A DUPLICATE OF DASD 0700
07:14:12 Start ((Warm|Force|COLD|CLEAN) (DRain) (DIsable)  (NODIRect)
07:14:12       (NOAUTOlog)) or (SHUTDOWN)          <---------------------- what 'cha want ?
07:14:19 WARM DRAIN NOAUTOLOG                      <---------------------- plus hit 'Enter'
07:14:19 NOW 07:14:19 CDT WEDNESDAY 2023-06-14
07:14:19 Change TOD clock (Yes|No)                 <---------------------- just 'Enter' to this
07:14:21
07:14:21 The directory on volume M01RES at address 0703 has been brought online.
07:14:28 HCPWRS2513I
07:14:28 HCPWRS2513I Spool files available     1614
07:14:30 HCPWRS2512I Spooling initialization is complete.
07:14:30 DASD 0704 dump unit CP IPL pages 25157 PGMBKs DEFAULT FRMTBL DEFAULT
07:14:30 HCPAAU2700I System gateway ZVMJEDI identified.
07:14:30 z/VM Version 7 Release 2.0, Service Level 2001 (64-bit),
07:14:30 built on IBM Virtualization Technology
07:14:30 There is no logmsg data
07:14:30 FILES: 0056 RDR, 0014 PRT,   NO PUN
07:14:30 LOGON AT 07:14:30 CDT WEDNESDAY 06/14/23
07:14:30 GRAF  0020 LOGON  AS  OPERATOR USERS = 1

                                                            HOLDING   ZVMJEDI
zVMJedi commented 1 year ago

Actually, you should be getting the first Command Reject even before the "What kind of start do you want??" message appears; it will occur the moment it tries to bring the Object Directory online. like this:

20:50:54 z/VM  V7 R2.0  SERVICE LEVEL 0000 (64-BIT)
20:50:55 SYSTEM NUCLEUS CREATED ON 2020-06-26 AT 09:02:09, LOADED FROM M01RES
20:50:55
20:50:55 ****************************************************************
20:50:55 * LICENSED MATERIALS - PROPERTY OF IBM*                        *
20:50:55 *                                                              *
20:50:55 * 5741-A09 (C) COPYRIGHT IBM CORP. 1983, 2020. ALL RIGHTS      *
20:50:55 * RESERVED. US GOVERNMENT USERS RESTRICTED RIGHTS - USE,       *
20:50:55 * DUPLICATION OR DISCLOSURE RESTRICTED BY GSA ADP SCHEDULE     *
20:50:55 * CONTRACT WITH IBM CORP.                                      *
20:50:55 *                                                              *
20:50:55 * * TRADEMARK OF INTERNATIONAL BUSINESS MACHINES.              *
20:50:55 ****************************************************************
20:50:55
20:50:55 ****************************************************************
20:50:55 * IBM z/VM Single System Image Function is active.
20:50:55 ****************************************************************
20:50:55
20:50:55 HCPZCO6718I Using parm disk 1 on volume VMCOM1 (device 0720).
20:50:55 HCPZCO6718I Parm disk resides on cylinders 1 through 120.
20:50:55 HCPERP500I  DASD  1723 AN OPERATION WAS TERMINATED BECAUSE A
20:50:55 HCPERP500I  COMMAND REJECT ERROR OCCURRED
20:50:55 HCPERP6300I SENSE DATA FORMAT = 00       MSG CODE = 01
20:50:55 HCPERP6301I CHANNEL COMMAND WORD COMMAND CODE = N/A HPF
20:50:55 HCPERP6302I SEEK ADDRESS =   000000000000
20:50:55 HCPERP6303I SENSE = 80000000 00FFFF01 00000000 00000000 00000000
20:50:55 HCPERP6303I 00000000 00000080 00000000
20:50:55 HCPERP6304I IRB = 00C24017 1F6AE008 02000040 00800000
20:50:55 HCPERP6305I USERID = SYSTEM
20:50:55 HCPERP2216I CHANNEL PATH ID = 17
20:50:55 HCPUDX1777E The directory on volume M01RES could not be initialized.
20:50:55 HCPUDS1752E No directory could be initialized.
20:50:55 HCPPLM1663E SSI function WORTHINESS CHECK has failed for service DIRECT
ORY.
20:50:55 HCPPLM1691E SSI service initialization failed
20:50:55 HCPPLM1697I The state of SSI system ZVMJEDI1 has changed from DOWN to I
SOLATED
20:50:55 HCPPLM1698I The mode of the SSI cluster is SAFE
20:50:55 HCPERP500I  DASD  1723 AN OPERATION WAS TERMINATED BECAUSE A
20:50:55 HCPERP500I  COMMAND REJECT ERROR OCCURRED
20:50:55 HCPERP6300I SENSE DATA FORMAT = 00       MSG CODE = 01

                                                          HOLDING   ZVMJEDI1
zVMJedi commented 1 year ago

Eureka! But don't start cheering yet. Well, ok... cheer a little, but not too loudly.

I noticed you DEDICATE'd your CP VOLS, instead of using DEVNO statements as I did. I changed my 2nd-level ID to use DEDICATEs and now mine is coming up cleanly, NO Command Rejects anywhere and AUTOLOG1 brought up all The Usual Suspects. So far, so good.

This is a great clue, though, because what it tells us is: when a volume is DEDICATE'd, that changes completely how the I/O is handled between Host and Guest. So "something" is going wrong between Host 7.2 and Guest 7.2 as they pass the I/O back and forth when DEVNO is used. If I remember right, and I'm not sure I do, when a volume is DEDICATED, the Guest gets to do its own I/O and Host CP just watches. Or maybe it's the other way around. Aaron???? Geez, I hate getting old!

Unfortunately, this will preclude this 2nd level z/VM from ever being part of a true 2nd-level SSI cluster, because each Member has to see everyone else's CP OWNED volumes, you'll notice I had LINK statements to 3 other eventual Members. Thus, why full-pack DEVNO was necessary. See Page 79 of the z/VM 7.2 Installation Guide:

Anyway, that's what I've discovered from here. I hope it will be helpful. This much success WILL allow the creation of what back in the day was called CSE ( Cross-System Extension, the predecessor of SSI ) but without being able to share SPOOL.

Thanks. Charles

arfineman commented 1 year ago

The logic in zVM for CCW translation is different between Dedicated/Attached and Devno/Minidisk. The bottom line is it should work either way. But just an FYI: the first thing it checks is Byte 6, Bit 6 of RDC. If it is not on, then it will go thru a series of checks with control unit models, starting with 2107 then 2105, then 1750, 3390-6.. etc. Best regards,

Peter-J-Jansen commented 1 year ago

Charles / zVMJedi,

Have you tried with MDISK statements including the MWV option? You could place all of these as needed for z/VM SSI members in a VM, e.g. called VMDUMMY, that never get's IPL'd. It's the technique to run z/OS Parallel Sysplex ("PS") members under z/VM, and they too all need access to all 2nd level z/OS PS DASD's. (This uses a non-IPL'd VM named MVSDUMMY; each actual z/OS PS member then LINKs MW to MVSDUMMY's full pack MDISKS.)

Cheers,

Peter

zVMJedi commented 1 year ago

Hi Peter, I believe I did try MDISK statements when I first started this endeavour and they didn't work. They gave the same Command Rejects as DEVNO. I'll try again so MDISK can be included in The Usual Suspects round-up. I used that same trick back in the day when my Guests were DOS/VSE and VSE/ESA. Thanks!

Fish-Git commented 1 year ago

Hi Fish, Try your reply with #CP VI VMSG xxxxxx where xxxxxx is your reply text. Best regards,

THANK YOU AARON!!   :-D

It worked! My second level system is up and running!

Unfortunately(?) however, I'm not seeing any of the I/O errors that Charles is seeing.   :(

arfineman commented 1 year ago

You should not see the errors for dedicated/attached devices on the control unit types I previously posted. But it will still fail on minidisk/devno devices and that should not be. Best regards,

zVMJedi commented 1 year ago

Fish! See my Comment from 3 hours ago.

Fish-Git commented 1 year ago

Since you're DEDICATE'ing the DASD in the CP Dir you don't need the ATTACH'es.

Actually I do.

When the first level system is IPL'ed, it complains(?) about duplicate volume ids (volsers) or something (I forget what the exact message was; I didn't bother to write it down), more than likely because my second level system's dasds are exact copies of my first level system's dasds:

#  Disk Drives

0123    3390    "$(ZVM72DIR)/M01RES.cckd64" sf="$(ZVM72DIR)/M01RES_Shadow_*.cckd64" cu=$(CU)
0124    3390    "$(ZVM72DIR)/VMCOM1.cckd64" sf="$(ZVM72DIR)/VMCOM1_Shadow_*.cckd64" cu=$(CU)
0125    3390    "$(ZVM72DIR)/720RL1.cckd64" sf="$(ZVM72DIR)/720RL1_Shadow_*.cckd64" cu=$(CU)
0126    3390    "$(ZVM72DIR)/720RL2.cckd64" sf="$(ZVM72DIR)/720RL2_Shadow_*.cckd64" cu=$(CU)
0127    3390    "$(ZVM72DIR)/M01S01.cckd64" sf="$(ZVM72DIR)/M01S01_Shadow_*.cckd64" cu=$(CU)
0128    3390    "$(ZVM72DIR)/M01P01.cckd64" sf="$(ZVM72DIR)/M01P01_Shadow_*.cckd64" cu=$(CU)

0223    3390    "$(ZVM72DIR)/M01RES.cckd64" sf="$(ZVM72DIR)/M01RES_LEVEL2_Shadow_*.cckd64" cu=$(CU)
0224    3390    "$(ZVM72DIR)/VMCOM1.cckd64" sf="$(ZVM72DIR)/VMCOM1_LEVEL2_Shadow_*.cckd64" cu=$(CU)
0225    3390    "$(ZVM72DIR)/720RL1.cckd64" sf="$(ZVM72DIR)/720RL1_LEVEL2_Shadow_*.cckd64" cu=$(CU)
0226    3390    "$(ZVM72DIR)/720RL2.cckd64" sf="$(ZVM72DIR)/720RL2_LEVEL2_Shadow_*.cckd64" cu=$(CU)
0227    3390    "$(ZVM72DIR)/M01S01.cckd64" sf="$(ZVM72DIR)/M01S01_LEVEL2_Shadow_*.cckd64" cu=$(CU)
0228    3390    "$(ZVM72DIR)/M01P01.cckd64" sf="$(ZVM72DIR)/M01P01_LEVEL2_Shadow_*.cckd64" cu=$(CU)

As you can see, I've simply defined another set of dasd (with different device addresses/CUUs) using the exact same set of base images, but specifying a different set of shadow files for each. (All of my base dasd images are marked read-only; I'm running on Windows.)

I felt that was the fastest/easiest way to get a second level system created. (I didn't want to have to go through a full formal install.)

In any case, it certainly doesn't hurt anything to detach and re-attach my dasds, right?   :)

Fish-Git commented 1 year ago

Actually, you should be getting the first Command Reject even before the "What kind of start do you want??" message appears; it will occur the moment it tries to bring the Object Directory online.

Well, it's not happening to me.   :(

arfineman commented 1 year ago

Dedicating a device and attaching it is the same thing. In either case, the device can not be shared and it follows the logic path in CCW translation. DEVNO/MINIDISK allows for device sharing and has a more strict CCW translation. Best regards,

zVMJedi commented 1 year ago

It's not happening because you're using ATTACH'd volumes, which take a different I/O path. As soon as you try to use MDISK or DEVNO statements in your 2nd-level UserID definition, you'll get the pre-Fourth-of-July fireworks show.

Fish-Git commented 1 year ago

Eureka! But don't start cheering yet. Well, ok... cheer a little, but not too loudly.

I noticed you DEDICATE'd your CP VOLS, instead of using DEVNO statements as I did. I changed my 2nd-level ID to use DEDICATEs and now mine is coming up cleanly, NO Command Rejects anywhere and AUTOLOG1 brought up all The Usual Suspects. So far, so good.

Interesting!

Unfortunately, this will preclude this 2nd level z/VM from ever being part of a true 2nd-level SSI cluster, because each Member has to see everyone else's CP OWNED volumes, you'll notice I had LINK statements to 3 other eventual Members. Thus, why full-pack DEVNO was necessary.

Yes, I noticed that in your sample directory statements that you were using, but didn't think it was anything important. (I don't "do" SSI clusters. I like to keep things simple. SSI clusters are for z/VM Jedis like you, not for mere mortals like me.<G>)

Anyway, that's what I've discovered from here. I hope it will be helpful. This much success WILL allow the creation of what back in the day was called CSE ( Cross-System Extension, the predecessor of SSI ) but without being able to share SPOOL.

That's good to know.

Unfortunately, that's kind of bad news for me, because it means I am unable to recreate your problem in order to verify that my quick fix is good or not. (I am NOT interested in trying to set up an SSI cluster! I'm a Hercules developer. I don't have the time to learn how to operate all the many different operating systems that Hercules is able to run. I have my hands full with Hercules. I let you guys -- the Hercules users -- have all the fun with that!)

Give me a few minutes(?) and I'll commit what I believe is the fix (according to Aaron anyway): initializing the Control Unit features string with the X'02' bit turned on in the 6th byte of the RDC.

I'll let you know once it's been committed. Once it is, you should then simply do a pull (from the 'develop' branch of course) and rebuild your Hercules.

Thank you to both you and Aaron with all the help you two have provided me. I really appreciate it you guys!

Fish-Git commented 1 year ago

You should not see the errors for dedicated/attached devices on the control unit types I previously posted. But it will still fail on minidisk/devno devices and that should not be.

Because Hercules is not (YET!) turning on the X'02' bit in the RDC byte you mentioned. That's why. (Hopefully!)

But I'm about to fix that, so everybody just hang loose for a little bit...

Fish-Git commented 1 year ago

Fish! See my Comment from 3 hours ago.

Saw it! And responded to it.

Fish-Git commented 1 year ago

It's not happening because you're using ATTACH'd volumes, which take a different I/O path. As soon as you try to use MDISK or DEVNO statements in your 2nd-level UserID definition, you'll get the pre-Fourth-of-July fireworks show.

I know that, but how to I turn them into MDISK or DEVNO statements? I don't really need to be sharing these dasds with any of the other VM users, so why use MDISK or DEVNO? Other than the fact that doing so will trigger the problem of course!

I mean, I'm willing to try (if you can explain to me how to do it), but generally speaking, in normal situations, DEDICATE is the proper way to go about it, yes?

zVMJedi commented 1 year ago

Well, it's not about setting up an SSI cluster, it's just getting z/VM to IPL 2nd-level.

Replace your DEDICATE statements with these:

    MDISK 0720 3390 DEVNO 0720 MW
    MDISK 0721 3390 DEVNO 0721 MW
    MDISK 1723 3390 DEVNO 1723 MW
    MDISK 1724 3390 DEVNO 1724 MW
    MDISK 1725 3390 DEVNO 1725 MW
    MDISK 1726 3390 DEVNO 1726 MW

adjusting the RDEV's and DEVNO's to the addresses of your second-level DASD, then in your 2nd-level ID, IPL your M01RES address and it should blow up beautifully.

As for DEDICATE, "best practices" with DASD is "don't do it unless you have a really good reason". As Peter tossed in a few Comments ago, you want to define your "everybody uses these" DASD to a dummy NOLOG UserID, then LINK to that from whomever needs to see whatever.

arfineman commented 1 year ago

Since SSI requires most of the DASD volumes be shared, the only way to setup an SSI cluster under Hercules is to define full pack minidisks with MW. But the fact is you do not need to have a bunch of dedicated volumes to setup a second level zVM system. You can have a fully functional test second level zVM system with less than 1000 cylinders. Best regards,

Fish-Git commented 1 year ago

Fix committed -- 706b63a06758809842be4d4ba8790c0f5c17bcaf

Please git pull, rebuild and retest to confirm. Thanks.

zVMJedi commented 1 year ago

Thank you VERY much! I've never rebuilt for Windows but I know there's a How-To for it somewhere; please point me there and I'll get going. Never mind, I've FOUND it. YIKES!

Fish-Git commented 1 year ago

Well, it's not about setting up an SSI cluster, it's just getting z/VM to IPL 2nd-level.

I suspected as much, which is why I asked.

Replace your DEDICATE statements with these:

Thanks. I'll give it a try.

As for DEDICATE, "best practices" with DASD is "don't do it unless you have a really good reason". As Peter tossed in a few Comments ago, you want to define your "everybody uses these" DASD to a dummy NOLOG UserID, then LINK to that from whomever needs to see whatever.

Understood. But since I don't want these particular dasds shared with anyone (I'm not trying to setup an SSI cluster), there's really no need for me to use MDISK or DEVNO, yes? Besides, I was always taught that it's better (more efficient?) for z/VM to not get involved at all *or only get minimally involved) in a guest's I/O. Thus DEDICATE.

I'm going to try using MDISK solely to try and recreate the problem.

(But if I was setting up a 2nd level for real, I'd use DEDICATE!)

Fish-Git commented 1 year ago

But the fact is you do not need to have a bunch of dedicated volumes to setup a second level zVM system. You can have a fully functional test second level zVM system with less than 1000 cylinders.

I'm sure I could. But I was only trying to get from point A to point B as quickly as possible, and using an exact copy of my existing z/VM 7.2 dasd seemed the fastest and easiest route to take.

Fish-Git commented 1 year ago

Thank you VERY much! I've never rebuilt for Windows but I know there's a How-To for it somewhere; please point me there and I'll get going. Never mind, I've FOUND it. YIKES!

Use Bill's Hercules Helper script. It's fast, easy and foolproof. It does everything for you, from the cloning of the repository, installing needed packages, configuring, building, etc. You just enter a simple command and VOILA! Within minutes you have a working Hercules.

wrljet commented 1 year ago

For Windows, use Hercules Helper for Windows

Be sure to read the instructions (which aren't very good) carefully, and don't be bashful about any questions (before trying your own corrective actions).

zVMJedi commented 1 year ago

For Windows, use Hercules Helper for Windows

Be sure to read the instructions (which aren't very good) carefully, and don't be bashful about any questions (before trying your own corrective actions).

Thank you; I'm afraid this is going to take me a while, so be patient. I'm going to be doing this on a Windows Server 2008R2, if that's not an appropriate host, I'm stuck . I'm never going to say an unkind word about VMSES/E again.

wrljet commented 1 year ago

Thank you; I'm afraid this is going to take me a while, so be patient. I'm going to be doing this on a Windows Server 2008R2, if that's not an appropriate host, I'm stuck . I'm never going to say an unkind word about VMSES/E again.

We're not in a hurry.

I've never tried it on Windows Server 2008. But I suspect it will work fine. It does work on Windows 7 (with a little fiddling).

Also, I can just send you a pre-built, if you want.

Bill

zVMJedi commented 1 year ago

Thank you; I'm afraid this is going to take me a while, so be patient. I'm going to be doing this on a Windows Server 2008R2, if that's not an appropriate host, I'm stuck . I'm never going to say an unkind word about VMSES/E again.

We're not in a hurry.

I've never tried it on Windows Server 2008. But I suspect it will work fine. It does work on Windows 7 (with a little fiddling).

Also, I can just send you a pre-built, if you want.

Bill

I would appreciate that, Bill. I've got it unZIP'd, but I'm already confused about what options I should specify once in PowerShell and ready to start that buildall.ps1. I do need to learn this process, though, and I will.

wrljet commented 1 year ago

OK, let me get it built.

wrljet commented 1 year ago

Hercules build 706b63a

For what it's worth, the command I used in Windows 7 with PowerShell 5.1 was:

.\hercules-buildall.ps1 -BuildDir C:\xfer\hercules-develop -VS2017 -GitBranch develop

It had been run on that Windows 7 system before, so all the "fiddly" part was already in place for me.

zVMJedi commented 1 year ago

Alas, I've put the new build in place and now I'm getting this on every UserID starting on 1st level CP:

15:51:31 HCPERP500I  DASD  0703 AN OPERATION WAS TERMINATED BECAUSE A
15:51:31 HCPERP500I  COMMAND REJECT ERROR OCCURRED
15:51:31 HCPERP6300I SENSE DATA FORMAT = 00       MSG CODE = 04
15:51:31 HCPERP6301I CHANNEL COMMAND WORD COMMAND CODE = E7
15:51:31 HCPERP6302I SEEK ADDRESS =   000001180000
15:51:31 HCPERP6303I SENSE = 80000000 00FFFF04 00000000 00000000 00000000
15:51:31 HCPERP6303I 00000000 00000080 00011800
15:51:31 HCPERP6304I IRB = 00C24017 5554A568 0E000000 00800000
15:51:31 HCPERP6305I USERID = OPERSYMP
15:51:31 HCPERP2216I CHANNEL PATH ID = 07
15:51:31 HCPERP500I  DASD  0703 AN OPERATION WAS TERMINATED BECAUSE A
15:51:31 HCPERP500I  COMMAND REJECT ERROR OCCURRED
15:51:31 HCPERP6300I SENSE DATA FORMAT = 00       MSG CODE = 04
15:51:31 HCPERP6301I CHANNEL COMMAND WORD COMMAND CODE = E7
15:51:31 HCPERP6302I SEEK ADDRESS =   000001E70001
15:51:31 HCPERP6303I SENSE = 80000000 00FFFF04 00000000 00000000 00000000
15:51:31 HCPERP6303I 00000000 00000080 0001E701
15:51:31 HCPERP6304I IRB = 00C24017 5554A568 0E000000 00800000
15:51:31 HCPERP6305I USERID = OPERSYMP
15:51:31 HCPERP2216I CHANNEL PATH ID = 07

They're occurring on any of my DASD where a UserID has some MDISK it needs as it gets AUTOLOG'd.

I have cu=3990-6 on my .CNF definitions.

Here's one from OPERATOR as I IPL'd CMS:

15:54:41 I CMS
z/VM V7.2.0    2020-06-26 09:03
15:54:42 HCPERP500I  DASD  0703 AN OPERATION WAS TERMINATED BECAUSE A
15:54:42 HCPERP500I  COMMAND REJECT ERROR OCCURRED
15:54:42 HCPERP6300I SENSE DATA FORMAT = 00       MSG CODE = 04
15:54:42 HCPERP6301I CHANNEL COMMAND WORD COMMAND CODE = E7
15:54:42 HCPERP6302I SEEK ADDRESS =   00000A480000
15:54:42 HCPERP6303I SENSE = 80000000 00FFFF04 00000000 00000000 00000000
15:54:42 HCPERP6303I 00000000 00000080 000A4800
15:54:42 HCPERP6304I IRB = 00C24017 2AAA1568 0E000000 00800000
15:54:42 HCPERP6305I USERID = OPERATOR
15:54:42 HCPERP2216I CHANNEL PATH ID = 07
DMSACP723I D (192) R/O
Ready; T=0.14/0.19 15:54:42

They're all the 'E7' CCW.

This is what the RDCBK for 703 looks like:

loc 703
16:01:26 RDEV        CPVOL       VEXBK
16:01:26 00EC5CE8    0115F000    00E39000
Ready; T=0.01/0.01 16:01:26
* COMMENT EC5CE8 + X'138' = EC5E20
CP D HL00EC5E20
16:02:44 HL00EC5E20  00E5A1C0                            06 RFFF62E20
Ready; T=0.01/0.01 16:02:44
CP D HL00E5A1C0.64
16:03:07 HL00E5A1C0  3990E933 900C5200 10102032 2721000F 06 RFFFCD1C0
16:03:07 HL00E5A1D0  E000E5A2 05940222 13090674 00000000
16:03:07 HL00E5A1E0  00000000 00000000 32321502 DFEE0001
16:03:07 HL00E5A1F0  0677080F 007F4800 15FF0000 00002721
16:03:07 HL00E5A200  00E54088 F3F3F9F0 A1DC0482 40000000
16:03:07 HL00E5A210  002C4838 002ABD90 0020A170 001FED20
16:03:07 HL00E5A220  001F9338
Ready; T=0.01/0.01 16:03:07
wrljet commented 1 year ago

In general, it would be good to show the top of the Hercules output just so we can be sure it's the intended build/commit.

Beyond that, I'll leave it to the experts on this issue.

zVMJedi commented 1 year ago

In general, it would be good to show the top of the Hercules output just so we can be sure it's the intended build/commit. Beyond that, I'll leave it to the experts on this issue.

Here you go, the Version sort of took me by surprise 4.7.0.10955?

HHC01603I version
HHC01413I Hercules version 4.7.0.10955-SDL-DEV-g706b63a0
HHC01414I (C) Copyright 1999-2023 by Roger Bowler, Jan Jaeger, and others
HHC01417I *** Hercules-Helper Test Build ***
HHC01415I Build date: Jun 14 2023 at 16:15:00
HHC01417I Built with: Microsoft Visual Studio 2017 (MSVC 191627048 0)
HHC01417I Build type: Windows MSVC AMD64 host architecture build
HHC01417I Modes: S/370 ESA/390 z/Arch

I thought we were barely at 4.6 as of a few days ago.

arfineman commented 1 year ago

Can you do a t+703 and post the traces for the failing I/O?

wrljet commented 1 year ago

Here you go, the Version sort of took me by surprise 4.7.0.10955?

That's just how we do the versioning. After an official release to the 'master' branch, the major number bumps in the 'develop' branch. Lots of projects work like that.

Fish-Git commented 1 year ago

Here you go, the Version sort of took me by surprise 4.7.0.10955?

That's just how we do the versioning. After an official release to the 'master' branch, the major number bumps in the 'develop' branch.

Correct. The current official release is indeed 4.6, but that's in the 'master' branch of our git repository. Once we release a new version, we then begin development on the next version in the 'develop' branch: 4.7 in this case, which is what you (Charles) are running. That's where the fix was commited: in the "still under development" 'develop' branch of our repository. That's what the -DEV part of the version string means: "This is the 4.7 version of Hercules that is STILL UNDER DEVELOPMENT".

zVMJedi commented 1 year ago

Can you do a t+703 and post the traces for the failing I/O?

I turned on T+703 and then LOG'd on to an ID that causes the Reject to the address, looks like this:

HHC01315I 0:0703 CHAN: ccw E7400041 55544510=>01800000 00000000 00000000 40C00000 ............ {..
HHC01312I 0:0703 CHAN: stat 0E00, count 0000
HHC01313I 0:0703 CHAN: sense 80000000 00FFFF04 00000000 00000000 00000000 00000000 00000080 0001EE00
HHC01314I 0:0703 CHAN: sense CMDREJ
HHC01603I T+ 703
HHC02229I Instruction tracing on range 703-703
HHC00801I Processor CP00: SIE: Special-operation exception interruption code 0013 ilc 4
HHC02324I SIE: CP00: PSW=000830008002416A INST=B2190000     SAC   0(0)                   set_address_space_control
HHC02326I SIE: CP00: R:00000000:A:000000002A8AC000:K:06=00080000 80024038 06000068 70FF0200  ...... .........
HHC02269I CP00: GR00=00000000 GR01=00080000 GR02=00000000 GR03=00000000
HHC02269I CP00: GR04=00000000 GR05=00000000 GR06=00000000 GR07=00000000
HHC02269I CP00: GR08=00000000 GR09=00000000 GR10=00000000 GR11=00000000
HHC02269I CP00: GR12=00024000 GR13=00000000 GR14=80024088 GR15=00000000
HHC02271I CP00: CR00=000000E0 CR01=00000000 CR02=00000000 CR03=00000000
HHC02271I CP00: CR04=00000000 CR05=00000000 CR06=00000000 CR07=00000000
HHC02271I CP00: CR08=00000000 CR09=00000000 CR10=00000000 CR11=00000000
HHC02271I CP00: CR12=00000000 CR13=00000000 CR14=C2000000 CR15=00000000
HHC00801I Processor CP04: SIE: Specification exception interruption code 0006 ilc 4
HHC02324I SIE: CP04: PSW=000810008002338A INST=B2630011     CMPSC 1,1                    cmpsc_2012
HHC02326I SIE: CP04: R:0FE7EA50:A:000000002924FA50:K:F6=FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF  ................
HHC02326I SIE: CP04: R:0FE7EA50:A:000000002924FA50:K:F6=FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF  ................
HHC02269I CP04: GR00=00000061 GR01=0FE7EA50 GR02=0002338E GR03=C1D4C5D5
HHC02269I CP04: GR04=00000006 GR05=00000000 GR06=013F1000 GR07=00022000
HHC02269I CP04: GR08=80F3B480 GR09=00021000 GR10=00000000 GR11=000047A8
HHC02269I CP04: GR12=000232E0 GR13=00021DF0 GR14=80023346 GR15=00000000
HHC02271I CP04: CR00=000010E0 CR01=00000000 CR02=00000000 CR03=00000000
HHC02271I CP04: CR04=00000000 CR05=00000000 CR06=00000000 CR07=00000000
HHC02271I CP04: CR08=00000000 CR09=00000000 CR10=00000000 CR11=00000000
HHC02271I CP04: CR12=00000000 CR13=00000000 CR14=C2000000 CR15=00000000
HHC01315I 0:0703 CHAN: ccw E7400041 2AAA1510=>01800000 00000000 00000000 40C00000 ............ {..
HHC01312I 0:0703 CHAN: stat 0E00, count 0000
HHC01313I 0:0703 CHAN: sense 80000000 00FFFF04 00000000 00000000 00000000 00000000 00000080 0001EE00
HHC01314I 0:0703 CHAN: sense CMDREJ
herc =====>

but I've never needed to turn on Trace in Hercules, so if the Trace has gone somewhere else, where is that?

  (EDIT by Fish, after the fact: your device trace failed because you entered the command wrong. The command was supposed to be "t+703" as all one word (with no spaces). The command you entered was "t+", which is the command to turn on instruction tracing, not device tracing! (Oops!) The "703" in your command (because of the errant space) was thus interpreted as the storage range of where the instruction trace should take place. And since 703 is an odd address, there will never be any instruction executed at that address! Thus no tracing occurred! (Oops!)

So the lesson to be learned here is, to start a DEVICE I/O trace, you need to enter the command "t+xxx" (with no spaces!), where 'xxx' is the device number of the device you want to trace I/O for. Understand? -- Fish)

wrljet commented 1 year ago

Here you go, the Version sort of took me by surprise 4.7.0.10955?

That's just how we do the versioning. After an official release to the 'master' branch, the major number bumps in the 'develop' branch.

Correct. The current official release is indeed 4.6, but that's in the 'master' branch of our git repository. Once we release a new version, we then begin development on the next version in the 'develop' branch: 4.7 in this case, which is what you (Charles) are running. That's where the fix was commited: in the "still under development" 'develop' branch of our repository. That's what the -DEV part of the version string means: "This is the 4.7 version of Hercules that is STILL UNDER DEVELOPMENT".

And, the 'develop' branch isn't guaranteed to even build correctly on all systems, at any random point in the commit timeline. Although we all try to do our best!

Fish-Git commented 1 year ago

Alas, I've put the new build in place and now I'm getting this on every UserID starting on 1st level CP:

15:51:31 HCPERP500I  DASD  0703 AN OPERATION WAS TERMINATED BECAUSE A
15:51:31 HCPERP500I  COMMAND REJECT ERROR OCCURRED
15:51:31 HCPERP6300I SENSE DATA FORMAT = 00       MSG CODE = 04
15:51:31 HCPERP6301I CHANNEL COMMAND WORD COMMAND CODE = E7
15:51:31 HCPERP6302I SEEK ADDRESS =   000001180000
15:51:31 HCPERP6303I SENSE = 80000000 00FFFF04 00000000 00000000 00000000
15:51:31 HCPERP6303I 00000000 00000080 00011800

Yes, I'm seeing the same thing. Apparently the fix for this issue wasn't as "quick" as Aaron made it out to be!   ;-)

I'm looking into it and hope the have it fixed "soon". (as in possibly later tonight or tomorrow)