MISSING Facilities support

SDL-Hercules-390 / hyperion

The SDL Hercules 4.x Hyperion version of the System/370, ESA/390, and z/Architecture Emulator

Other

250 stars 92 forks source link

MISSING Facilities support #77

Open Fish-Git opened 6 years ago

Fish-Git commented 6 years ago

Hercules is currently missing support for the following z/Architecture facilities:

Note: Those that are checked have been recently implemented and thus are now supported. The ones that are unchecked are those which are currently unsupported and still need to be implemented.

For a complete list of supported/unsupported facilities, please see Question 5.01 of our "Frequently Asked Questions" document.

[x] BEAR-Enhancement Facility
[ ] Configuration-z/Architecture-Architectural-Mode (CZAM) Facility
[x] Constrained-Transactional-Execution Facility
[ ] CPU-Measurement Counter Facility
[ ] CPU-Measurement Sampling Facility
[x] Decimal-Floating-Point Packed-Conversion Facility
[x] Decimal Floating Point Zoned-Conversion Facility
[ ] DEFLATE-Conversion Facility
[ ] Enhanced-DAT Facility 2
[ ] Enhanced-Sort Facility
[ ] Enhanced-Suppression-on-Protection Facility 2
[ ] Entropy Encoding Compression Facility
[ ] ESA/390-Compatibility-Mode Facility
[x] Execution-Hint Facility
[ ] Extended-I/O-Measurement-Block Facility
[ ] Extended-I/O-Measurement-Word Facility
[ ] FCX-Bidirectional-Data-Transfer Facility
[ ] Fibre-Channel Extensions (FCX) Facility
[ ] Guarded-Storage Facility
[x] Insert-Reference-Bits-Multiple Facility
[ ] Instruction-Execution-Protection Facility
[x] Load-and-Trap Facility
[x] Load-and-Zero-Rightmost-Byte Facility
[x] Load-Program-Parameter Facility
[x] Load/Store-on-Condition Facility 2
[x] Local-TLB-Clearing Facility
[x] Message-Security-Assist Extension 5
[ ] Message-Security-Assist Extension 6
[x] Message-Security-Assist Extension 7
[ ] Message-Security-Assist Extension 8
[ ] Message-Security-Assist Extension 9
[x] Miscellaneous-Instruction-Extensions Facility 1
[x] Miscellaneous-Instruction-Extensions Facility 2
[x] Miscellaneous-Instruction-Extensions Facility 3
[ ] Move-Page-and-Set-Key Facility
[ ] Multiple-Epoch Facility
[ ] Multithreading Facility
[ ] Neural-Network-Processing-Assist Facility
[x] Nonquiescing Key-Setting Facility
[ ] Order Preserving Compression Facility
[x] PER Storage-Key-Alteration Facility
[x] PER Zero-Address-Detection Facility
[x] PFPO (Perform Floating Point Operation) Facility
[x] PPA-in-order Facility
[ ] Processor-Activity-Instrumentation Facility
[ ] Processor-Activity-Instrumentation Extension 1 Facility
[x] Processor-Assist Facility
[ ] Reset-DAT-Protection Facility
[ ] Restore-Subchannel Facility
[ ] Secure-Execution-Unpack Facility
[ ] Server-Time-Protocol Facility
[ ] Side-Effect-Access Facility
[ ] Storage-Key-Removal Facility
[ ] Test-Pending-External-Interruption Facility
[x] Transactional-Execution Facility
[ ] Ultravisor-Call Facility
[x] Vector-Enhancements Facility 1
[x] Vector-Enhancements Facility 2
[x] Vector Packed-Decimal Facility
[x] Vector-Packed-Decimal-Enhancement Facility 1
[x] Vector-Packed-Decimal-Enhancement Facility 2
[ ] Warning-Track Interruption Facility
[x] z/Architecture Vector Facility

Fish-Git commented 6 years ago

Above list of supported facilities updated on 2018-04-04 to reflect changes made by commit 4ee084f9e958ff5ae88c3ccad8be7500be47ebc5

Fish-Git commented 5 years ago

One of my SoftDevLabs customers (a long time mainframer with 30 years experience as a Systems Programmer) has sent me his code which implements our currently missing PFPO Facility (Perform Floating Point Operation) instruction.

He also says he has a mainframe assembler program that he used to test PFPO with that he's going to send me (which, once I get it, I will then turn into a runtest test case).

This post is just a heads up to let you know what I'm currently working on.

Fish-Git commented 5 years ago

Bob (Wood) must be busy. He hasn't responded to any of my emails yet. I'm reluctant to commit his PFPO code until I receive his test program first. His code is complicated.

Unless... @srorso Steve? Are you willing to review it and sign off on it? I'd really prefer to have a test program that verifies Hercules's results against real z hardware, but I suppose the runtest test program can come later. As long as you're comfortable that his code appears to be coded correctly, then I guess I'd be more willing to commit it without a test case.

(Of course, there's also the argument that since we've lived without it for this long, waiting a bit longer for Bob's test program isn't going to cause us or anyone else any burden. There's no real rush to get it put into Herc! Right?)

srorso commented 5 years ago

Hi Fish,

I would be happy to take a look.

And the other side of the argument re waiting is: what we have now does not work at all, so if the candidate PFPO code sorta works, that's a step forward. Both sides of the argument make sense...

Best Regards, Steve Orso

Fish-Git commented 5 years ago

Committed: 58fc456f289e7c2c06226ccfe2ba116136b9a99b.

Let me know what you think. I personally had trouble following it, but you may not.

If there're any comments you feel should be added to explain what's going on, that would be appreciated too.

Thanks!

srorso commented 5 years ago

Re PFPO:

This is a good start and much better than what exists right now.

Kudos for naming the version of the PoOp used as a reference; this helped me understand the basis for the code.

Validation of correct results really requires test cases and as Fish mentioned, validation on hardware. Basic test cases are easy to come up with. Corner cases (non-finites, rounding tests, exception generation and detection) are much tougher and take a lot of time to create. The current state of pfpo.c will not produce correct results for corner cases.

The following represent opportunities for additional functionality. Do not be daunted by this list; PFPO is an extremely comprehensive instruction with lots of options. I have listed opportunities in my perceived order of added value... but any addition of functionality is a good thing regardless of order.

Add comments, especially block header comments. This would be consistent with coding elsewhere in Hyperion. The block comments would simplify navigation through the source, especially as it grows. Some description and/or PoOp references for the table content would be helpful to future maintainers of the code. Example of what I mean by a block header comment:

/* **************************************************************** */
/* dfl2hfl - Convert finite DFP value to HFP                        */
/*                                                                  */
/* Rounding is performed per bits 60-63 of GR0, with reference to   */
/* FPC-specified rounding for GR0-specified modes 0 and 1.          */
/* Non-finite DFP inputs are handled elsewhere.                     */
/* **************************************************************** */

Validate the Operation Type Code (OTC) to be 0x01. Other values are listed in the PoOp as "Reserved/Invalid" but there is little explanation of the action taken for an invalid OTC. A specification exception seems reasonable; hardware validation would be most helpful.

Add support for the test bit (gr0 bit 32). This should be easy; add code at line 1109 to see if the bit is set. If set, return cc=1 or cc=3 as appropriate instead of an operation exception.

Add support to set values for Condition Code, Data Exception Code (DXC), GR1 return code, and the Floating Point Control Register (FPC) flag bits and to test the FPC mask bits when setting these values.

Settings of these based on the input and operation are detailed in Figure 9-30 "Actions for Various PERFORM FLOATING POINT Conditions." This figure is on p 9-46 of the PoOp -11. This table is not included in the PoOp -05, and the table is extremely helpful to a complete emulation of PFPO.

Add support for rounding modes, inexact suppression, scaled results, and the target radix control byte.

Rounding is likely for all conversions and, I suspect, is the reason an inexact suppression bit is included in GR0 (bit 56). Rounding is a black art; scrounging code from the SoftFloat-3e and decNumber libraries is highly recommended. Scaled results are highly likely when converting from longer formats to shorter. The inclusion of decNumber headers suggests this is already being considered.

Add support for DFP non-finite inputs (xNaN or infinity). DFP non-finites are easy to test for and there are only three possible instruction completions: Program Interruption with suppression or completion with a result of Hmax (infinity input) or +Hmax (NaN input). Macros at the beginning of pfpo.c suggest this is already being contemplated.

Add support for BFP as source or target of conversion. The BFP<->HFP conversions are "relatively" easy and should be much easier than DFP<->anything.

When one adds all of this stuff to pfpo.c, it may be worth considering making PFPO a separate shared library, perhaps loaded dynamically. (In support of this, I have been planning to convert SoftFloat-3 from a static to a shared library. That change would also allow CMake builds using Ninja on BSD-based systems including macOS. Right now Ninja barfs on the static library on BSD-based systems.)

It also appears that the current code is little-endian host dependent. Line 1026 stores GR0 in an S64, and then line 1039 references mainframe bits 40-47, the Operand Format Control (OFC) for operand 1, as byte 2 of the S64. On big-endian hosts, bits 40-47 are stored in byte 5. The same issue exists for the OFC for Operand 2 and will affect any code added to validate the OTC and/or extract the bits from bits 56-63 of GR0.

With respect to test case data, Harold Grovesteen's excellent SATK package includes a command-line utility (dfp.py) for converting between Densely-Packed Decimal DFP and human-readable representations. I found this utility extremely helpful when developing (as yet uncommitted) DFP arithmetic test cases. Human-readable representations are very useful for DFP-input corner cases and for creating/analyzing tests involving preferred/non-preferred quanta.

I used https://babbage.cs.qc.cuny.edu/IEEE-754/ for conversion between human-readable and BFP representations when doing BFP. Keep in mind that when doing BFP-input corner cases, the human-readable representation of a BFP value is worse than useless; it's a distraction. Rounding and exceptions are defined in terms of the BFP representation, not the converted decimal representation. The bfp-* test programs and cases illustrate this.

Ditto for HFP-input corner cases.

I remain happy to create test cases and any test case program(s)... or provide assistance to others regarding same.

This is an excellent beginning and much better than what exists now.

Best Regards, Steve Orso

Fish-Git commented 5 years ago

First of all, thank you for taking a look at this, Steve. Much appeciated!

The current state of pfpo.c will not produce correct results for corner cases.

Is it fixable? How hard would it be to correct the corner cases? It that something you could do?

Add comments, especially block header comments.

Which sounds like something you could easily do. Yes?

Some description and/or PoOp references for the table content would be helpful to future maintainers of the code.

Yes! This was one of the things I was (and still am!) having trouble understanding: the table usage. Block comments explaining the tables would be very helpful I should think (which again is something that sounds like would be easy for you to do but not me).

Validate the Operation Type Code (OTC) to be 0x01.

Good catch!

Add support for the test bit (gr0 bit 32).

Another piece of cake.

<skip ahead a bit...>

When one adds all of this stuff to pfpo.c, it may be worth considering making PFPO a separate shared library, perhaps loaded dynamically.

Or made into a new External Package? In any case, whether PFPO is a static library or a dynamically loaded shared library is beyond the scope of the PFPO instruction code itself IMO, and thus should best be discussed in another thread/issue. How the PFPO instruction's code gets built is of only secondary importance IMO. The thing I'm mostly interested in right now is making whatever coding changes are needed to ensure correct results according to the published architecture (i.e. PoOp).

It also appears that the current code is little-endian host dependent.

That is definitely a problem that needs to be fixed!

With respect to test case data, Harold Grovesteen's excellent SATK package includes a command-line utility (dfp.py) for converting between Densely-Packed Decimal DFP and human-readable representations.

Is this documented anywhere? I wasn't even aware of its existence until just now! It seems to be lacking any type of --help option too, so it's a complete mystery to me as to what it's supposed to do or how one is supposed to use it. I hate that.

I remain happy to create test cases and any test case program(s)... or provide assistance to others regarding same.

How much trouble would it be for you to take care of some of the needed fixes you mentioned? (e.g. comments, OTC validation, test bit support and endianess) If you could do that for us it would be a BIG help! Thanks!

Testing for and fixing the edge cases would obviously take much more time and effort, and is something that could come later I think. I don't want to rush things.

But we're so very close to finally having a working PFPO implementation that I think it would be a shame to not implement what we've already got (with the mentioned quick fixes of course).

As long as the fixing of those all important edge cases was being actively worked on by someone (i.e. by you, Steve? Yes? Are you willing to do that for us? Pretty please? With sugar on top?) then I think as soon as the mentioned quick fixes are applied (comments, otc, test bit, endianess) we should keep what we've already got rather than rip it out until it's 100% complete.

How do the rest of you guys feel about that?

Fish-Git commented 5 years ago

FYI: Something else I just noticed which you didn't mention that it looks like Bob's code isn't checking for either: the CR0 AFP bit:

The PERFORM FLOATING-POINT OPERATION (PFPO) instruction is subject to the AFP-register-control bit, bit 45 of control register 0. For PFPO to be executed successfully, the AFP-register-control bit must be one; otherwise, an AFP-register data exception, DXC 1, is recognized.

Another simple one line fix I should think.

srorso commented 5 years ago

The current state of pfpo.c will not produce correct results for corner cases.

Is it fixable? ...

Apart from the dependency on little endian at one point in the code, there is little to "fix" and much to develop. I consider the current state of pfpo.c to be incomplete, not broken, and I will wager Bob W. feels the same way.

How hard would it be to correct the corner cases?

Hard. The effort needed can be divided between additional conversion paths/rounding and exception processing.

There are 144 radix conversion paths inclusive of rounding modes defined in the PoOp for PFPO (18 currently implemented). Rounding mode support for the existing supported conversions (DFP<->HFP) would need to be added. HFP rounding will be time-consuming because there is no previous implementation of IEEE rounding for HFP. DFP, and later, BFP, rounding can borrow from code written in/using decNumber (resp. SoftFloat).

There are somewhere between 59 and 177 execption processing paths to support DFP and HPF targets. The large range accounts for the fact that some (most?) but not all of the exception paths are dependent on the target precision. No way to tell the real number until its coded, but either way it's a lot.

It that something you could do?

Sure...I can do it. But I wish to respect Bob W.'s current contribution and any future contribution he may wish to make. I would also be happy collaborate with him if and however he wishes. That would require some coordination with Bob, best done via PM.

Add comments, especially block header comments.

Which sounds like something you could easily do. Yes?

Probably...

Some description and/or PoOp references for the table content [...]

Yes! [...] that sounds like would be easy for you to do but not me).

Again, probably...

When one adds all of this stuff to pfpo.c, it may be worth considering making PFPO a separate shared library, perhaps loaded dynamically.

Or made into a new External Package?

I do not feel an external package is the best way to go for two reasons:

PFPO needs an intimate connection to the rest of Hercules because it uses and messes with the general purpose registers, the floating point registers, the FPC, and the condition code, and generates interrupts.
It is not a general-purpose library in the manner of decNumber, SoftFloat, PCRE, BZip2, or Zlib. There seems to be little benefit to updating the build scripts for a new external package that is not an external package.

Thinking your suggestion through a bit, though: if there were value to a general purpose thread-safe radix conversion library that PFPO could call, external package-dom might make sense. The call interface would be similar to the other IEEE libraries used by Hercules, and PFPO code would translate IEEE 754 2008 speak into cc, FPC flags, DXC, scaled results, and interruptions.

I believe the structure (shared library) should be established early. It's hard to change these things after the fact, and the change creates a requirement for a re-validation of the code. The change from implicitly loaded to dynamically loaded shared library is more reasonably done as a separate step.

[...] Harold Grovesteen's excellent SATK package includes a command-line utility

Is this documented anywhere?

Dunno...Harold demonstrated that he is as kind and generous as he is wise when he wrote it for me. And I am sure he is disappointed that his effort did not contribute to a (still-) planned validation of the Hercules DFP instruction set. This would be a good place to use it.

Besides, he was looking for someone to take on PFPO when I became a Hyperion developer.

How much trouble would it be for you to take care of some of the needed [...]

No trouble at all; I do this for fun. Bear in mind though, that it will not be fast because a) there is a lot to be done and b) I have some pretty significant competing activities.

Are you willing to do that for us?

Yes, provided that Bob W. remains the lead or a contributor to PFPO development if that is his wish, and with the understanding that whatever PFPO support gets developed will be ported into the "other" Hyperion, consistent with the terms of the Q license or any license I may apply to separate modules I may contribute.

I think as soon as the mentioned quick fixes are applied (comments, otc, test bit, endianess) we should keep what we've already got rather than rip it out until it's 100% complete.

I would keep it as is; it's better than what came before. As long as it does not prevent someone from building Hercules and has no impact unless one actually uses PFPO, keep it.

Fish-Git commented 5 years ago

When one adds all of this stuff to pfpo.c, it may be worth considering making PFPO a separate shared library, perhaps loaded dynamically.

Or made into a new External Package?

I do not feel an external package is the best way to go for two reasons:

PFPO needs an intimate connection to the rest of Hercules because it uses and messes with the general purpose registers, the floating point registers, the FPC, and the condition code, and generates interrupts.

I wasn't thinking of externalizing the entire instruction. Only the code that does the gnarly floating point shit.

Thinking your suggestion through a bit, though: if there were value to a general purpose thread-safe radix conversion library that PFPO could call, external package-dom might make sense. The call interface would be similar to the other IEEE libraries used by Hercules, and PFPO code would translate IEEE 754 2008 speak into cc, FPC flags, DXC, scaled results, and interruptions.

Which is more along the lines of what I was actually thinking.

How much trouble would it be for you to take care of some of the needed [...]

No trouble at all; I do this for fun. Bear in mind though, that it will not be fast because a) there is a lot to be done and b) I have some pretty significant competing activities.

I fully understand about that!

Are you willing to do that for us?

Yes, provided that Bob W. remains the lead or a contributor to PFPO development if that is his wish ...

Along those lines... I was thinking he might make a good addition to the team. What do you (and others) think? Should we invite him?

srorso commented 5 years ago

I remain willing to develop, contribute to, collaborate on, or assist others with the development of PFPO or its test cases. But I am reluctant to independently make changes to code another person is actively developing. Let's see how Bob W. wishes to proceed with this, or if he would rather move on to something else.

Fish-Git commented 5 years ago

I am reluctant to independently make changes to code another person is actively developing.

Understood.

Let's see how Bob W. wishes to proceed with this, or if he would rather move on to something else.

I'm unsure why, but he hasn't responded to my email yet. I'm thinking maybe he doesn't use email all that much. I might have to call him. I'll try to do that in the next few days.

Fish-Git commented 5 years ago

I'm unsure why, but he hasn't responded to my email yet. I'm thinking maybe he doesn't use email all that much. I might have to call him. I'll try to do that in the next few days.

I called Bob yesterday and he said he had a new version of his PFPO code that fixed some problems and includes support for BFP too. I explained that he needed to coordinate his efforts with yours, Steve (@srorso), as you are our floating point guy and were reluctant to make any changes if he was already doing so.

I also sent him an email with your email address too, so hopefully he will contact you directly regarding his current efforts so the two of you can coordinate your activities with each other. Please let me know if/when he does reach out to you.

I also asked him if he wanted to become a Hercules developer too, and he said yes, so I sent out the invite, but he hasn't accepted it yet. I'll ping him again in a few (days? weeks?) if he still hasn't accepted it by then.

Fish-Git commented 5 years ago

p.s. his github userid is @rwoodpd.

Fish-Git commented 4 years ago

FYI:

The following facilities should IMHO be fairly easy to do (relatively speaking):

DEFLATE-Conversion Facility
Insert-Reference-Bits-Multiple Facility
Miscellaneous-Instruction-Extensions Facility 2
Miscellaneous-Instruction-Extensions Facility 3

If someone wants to take a crack at getting their hands dirty with Hercules development before any of us gets a chance to implement any of the above, please, by all means, feel free to do so!

ivan-w commented 4 years ago

I'm putting myself in the loop! As you say, some of them are probably trivial.

ivan-w commented 4 years ago

I'll probably open issues for each of those as I go (Split to conquer)

ivan-w commented 4 years ago

I'll look at the low hanging fruits first! Because I'm lazy.

ivan-w commented 4 years ago

Insert Reference Bits Multiple Facility looks pretty trivial..

Fish-Git commented 4 years ago

Yeah, but the two Miscellaneous-Instruction-Extensions facilities look like they'd be more fun! :)

Most of their instructions seem fairly simple and straightforward, some even trivially so.

And the DEFLATE-Conversion Facility should be pretty simple too, given that the deflate algorithm is well defined and code that implements it is all over the internet.

I'm tempted to do them myself but right now I'm busy creating a new internal README to document how to add a new z/Architecture Facility to Hercules (i.e. which files need to be updated, especially the FT and FT2 tables and modxxx and instrxxx functions in source file facility.c, etc).

After that I want to create one for ARCH_DEP and one for using Harold's fantastic SATK/ASMA package to create runtest test cases, and finally, maybe one explaining how our (Bob's) currently-in-progress ongoing Transactional-Execution Facility (TXF) implementation effort all hangs together.

Knowing you, you'll probably have them all done before I can even get started on any of them! ;-)

ivan-w commented 4 years ago

Of course, I now realize I don't know if there is a need for a SIE intercept code for IRBM (and what the flag is). I'd need a copy of the HCPSI2GB MACRO from HCPOM2 MACLIB from a z/VM 7.1 system I think (since IBM no longer provides a CP System Control Block and Logic manual). I could unconditionally intercept it but it wouldn't work on z/VM versions that do not understand the instruction.

salva-rczero commented 1 year ago

Hi @Fish-Git,

As an exercise, more oriented to know Hercules better, than to provide a new facility (goal that I don't think is within my skills), I cloned the repository and following carefully your clear instructions, I tried to add the facility 129 (zVector), even if only for a couple of instructions: vector load & vector store.

I added:

At facility.c +FT( Z900, Z900, NONE, 129_ZVECTOR )
At feat900.h +#define FEATURE_129_ZVECTOR_FACILITY
At instfmts.h +#define VRX( _inst, _regs, _v1, _effective_addr2, _m3 ) ...
At opcode.c static INSTR_FUNC gen_opcode_e7xx[256][NUM_INSTR_TAB_PTRS]; static INSTR_FUNC gen_opcode_e7xx[256][NUM_INSTR_TAB_PTRS] = { /*E700*/ GENx___x___x___ , /*E701*/ GENx___x___x___ , /*E702*/ GENx___x___x___ , /*E703*/ GENx___x___x___ , /*E704*/ GENx___x___x___ , /*E705*/ GENx___x___x___ , /*E706*/ GENx___x___x900("VL" , VRX , ASMFMT_none , vector_load),...
At opcode.h #if defined( FEATURE_129_ZVECTOR_FACILITY ) DEF_INST(vector_load); DEF_INST(vector_store); #endif
The functions are code in a new file zvector.c
And some minor changes to other makefile, msvc files to compile.

Compile ok, build a minimum test and starts Hercules with facility 129 enabled. But I'm receiving a 0C1 because the opcode (E7xxxxxxxx06) is undefined.

What am i missing? Can you point me to a good place in cpu.c to place a breakpoint to debug?

Thanks in advance, salva.

srorso commented 1 year ago

Of course, I now realize I don't know if there is a need for a SIE intercept code for IRBM (and what the flag is). I'd need a copy of the HCPSI2GB MACRO from HCPOM2 MACLIB from a z/VM 7.1 system I think (since IBM no longer provides a CP System Control Block and Logic manual). I could unconditionally intercept it but it wouldn't work on z/VM versions that do not understand the instruction.

https://www.vm.ibm.com/pubs/cp710/SIEBK.HTML

The above is the v2 SIEBK, 390 & z.

-Steve O.

Fish-Git commented 1 year ago

At opcode.c static INSTR_FUNC gen_opcode_e7xx[256][NUM_INSTR_TAB_PTRS]; static INSTR_FUNC gen_opcode_e7xx[256][NUM_INSTR_TAB_PTRS] = { /*E700*/ GENx___x___x___ , /*E701*/ GENx___x___x___ , /*E702*/ GENx___x___x___ , /*E703*/ GENx___x___x___ , /*E704*/ GENx___x___x___ , /*E705*/ GENx___x___x___ , /*E706*/ GENx___x___x900("VL" , VRX , ASMFMT_none , vector_load),...

Compile ok, build a minimum test and starts Hercules with facility 129 enabled. But I'm receiving a 0C1 because the opcode (E7xxxxxxxx06) is undefined.

What am i missing? Can you point me to a good place in cpu.c to place a breakpoint to debug?

Thanks in advance, salva.

Hi salva!

First, I'm very pleased to see someone try to take on this challenge! Providing z/Architecture Vector Facility support is going to take a lot of effort!

That said, I think I might see what you're missing. Maybe.

Because the Vector instructions are extended opcode instructions, you're going to need to also add the required instruction decoding and routing logic for them, which currently doesn't exist.

You've coded the e7xx table with an entry for the E7........06 Vector Load and E7........0E Vector Store instructions, but the e7xx table is not being used anywhere! (Oops!)

In the gen_opcode_table table in opcode.c, you need to update the entry for the E7 opcode in the same way we're currently doing it for the E3, E4, E5 and E6 opcodes:

 /*E7*/   GENx___x___x900 ( ""          , e7xx , ASMFMT_e7xx     , execute_opcode_e7xx                                 ),

Additionally, you're going to need to add a new IPRINT_ROUT2 entry for e7xx:

IPRINT_ROUT2( e7xx, [5] )

and a corresponding IPRINT_FUNC function for format ASMFMT_VRX (and fix your gen_opcode_e7xx table entry to specify that format instead of ASMFMT_none which you're using now, which is wrong).

And finally, and most importantly, you're going to need to add a new statement to the init_runtime_opcode_tables function to properly initialize the e7xx table entries so that the facility.c code can properly update them to actually enable the instructions (and so our instruction routing logic can find them too!):

      replace_opcode_xxxx(arch, gen_opcode_e5xx[i][arch], 0xe5, i);
      replace_opcode_xxxx(arch, gen_opcode_e6xx[i][arch], 0xe6, i);
      replace_opcode_xxxx(arch, gen_opcode_e7xx[i][arch], 0xe7, i);             <-----ADD THIS LINE!----<<<
      replace_opcode_xx________xx(arch, gen_opcode_ebxx[i][arch], 0xeb, i);
      replace_opcode_xx________xx(arch, gen_opcode_ecxx[i][arch], 0xec, i);
      replace_opcode_xx________xx(arch, gen_opcode_edxx[i][arch], 0xed, i);

Do all that and I think things might work better.

Basically, you need to use the same technique as is currently being used for the E3, E4, E5 and E6 series of opcodes. So you can use one of them as a template, as guidance, for what you need to do for the new E7 opcodes you're adding.

But to be honest, I'm not 100% sure. I've never tried adding a brand new series of extended opcode instructions before, so take all of this with a grain of salt.

Hope that helps!

salva-rczero commented 1 year ago

Thanks @Fish-Git, I made some progress:

At opcode.c "e7xx" looks similar to "e6xx":

    Line 1841: static INSTR_FUNC gen_opcode_e7xx[256][NUM_INSTR_TAB_PTRS];
    Line 1897: #define execute_opcode_e7xx     operation_exception
    Line 2818:  /*E7*/   GENx370x390x900 ( ""          , e7xx , ASMFMT_e7xx     , execute_opcode_e7xx          ),
    Line 4593: static INSTR_FUNC gen_opcode_e7xx[256][NUM_INSTR_TAB_PTRS] =
    Line 7316:       replace_opcode_xxxx(arch, gen_opcode_e7xx[i][arch], 0xe7, i);

    Line 1840: static INSTR_FUNC gen_opcode_e6xx[256][NUM_INSTR_TAB_PTRS];
    Line 1896: #define execute_opcode_e6xx     operation_exception
    Line 2817:  /*E6*/   GENx370x390x900 ( ""          , e6xx , ASMFMT_e6xx     , execute_opcode_e6xx           ),
    Line 4333: static INSTR_FUNC gen_opcode_e6xx[256][NUM_INSTR_TAB_PTRS] =
    Line 7315:       replace_opcode_xxxx(arch, gen_opcode_e6xx[i][arch], 0xe6, i);

ASMFMT_VRX is working fine:

19:38:31 HHC00801I Processor CP00: Operation exception interruption code 0001 ilc 6
19:38:31 HHC02324I PSW=0000000000000000 000000000000000E INST=E710C0380006 VL    1,56(0,12),0           vector_load
19:38:31 HHC02326I R:0000000000000038:K:06=00000000 00000000 00000000 00000000  ................
19:38:31 HHC02326I R:0000000000000006:K:06=41D0 C80041D0 D800E710 C0380006 47F0 .}H..}Q.X.{....0
19:38:31 HHC02269I R0=0000000000000000 R1=0000000000000000 R2=0000000000000000 R3=0000000000000000
19:38:31 HHC02269I R4=0000000000000000 R5=0000000000000000 R6=0000000000000000 R7=0000000000000000
19:38:31 HHC02269I R8=0000000000000000 R9=0000000000000000 RA=0000000000000000 RB=0000000000000000
19:38:31 HHC02269I RC=0000000040000000 RD=0000000000001000 RE=0000000000000000 RF=0000000000000000

but still abending with 01C. I'll keep going, but there is too much new things for me.

Regards, salva.

Fish-Git commented 1 year ago

but still abending with 01C.

Can you show us your code? Do you have a repository we could clone? I'm sure it's something simple.

I'll keep going, but there is too much new things for me.

Please don't give up now! You're almost there!

If you could allow us to see all of your existing code, we could help you better. Do you have a repository we could clone?

Fish-Git commented 1 year ago

but still abending with 01C.

Are you doing your testing under z/OS? Or are you using a stand-alone test program? I'm not familiar with z/OS. What does "01C" mean?

You should be using a stand-alone test program to initially test with, not z/OS. Only once your test program works correctly, should you then try to use the instruction(s) under z/OS. Then if they still don't work, then you know it is something about z/OS, not Hercules.

I noticed that according to page 21-2 of the SA22-7832-13 "Principles of Operation" manual, in order to be able to use Vector Facility instructions under z/OS (i.e. the "control program"), you need to have certain Control Register bits set:

Vector Enablement Control

The vector instructions should only be used if both the vector enablement control (bit 46) and the AFP- register-control (bit 45) in control register zero are set to one. If the vector facility for z/Architecture is installed and a vector instruction is executed without the vector enablement control set, a data exception with DXC FE hex is recognized. If bit 45 of control register zero is not also set to one, it is unpredictable if a data exception is recognized. If the vector facility for z/Architecture is not installed, an operation excep- tion is recognized.

Additionally, it appears z/OS (i.e. the "control program") may also need to perform some special handling too before Vector instructions can be used:

Programming Note: When a control program initially enables the vector facility for z/Architecture for a task, which may occur in response to a data exception with DXC FE, it should ensure that newly enabled full vec- tor registers as well as the rightmost portions of vec- tor registers that overlap with any enabled floating- point registers are zeroed.

Try setting a breakpoint on your test instruction ('b' command), and when it's hit, display the control registers ('cr' command) and verify whether bits 45 and 46 of Control Register 0 are set or not. If they're not set, then it might z/OS that's causing your problem? (I don't know z/OS!)

p.s. I'd still like to see your code.

salva-rczero commented 1 year ago

Thanks you, @Fish-Git

0C1 = operation exception (PIC 001). I'm sorry, I still think in JCL & Assembler.
I'm trying a STAK standalone core sample, following your instructions.

I really appreciate your offer to look at my code. But I am learning a lot about Hercules in the debugging process . If I don't make progress today, I will upload my code to a repository to share it with you and the community.

Regards, salva.

salva-rczero commented 1 year ago

@Fish-Git

Finally, I get the zVector facility enabled.

Please check my updates at https://github.com/salva-rczero/hyperion-zvector.

Thanks for your guidance and encouragement. Now I'll try to reach some vector instruction working as POP dictates.

p.s. Please be patient, I'm an old mainframer and Visual Studio, x64, Git... are all new things for me.

Fish-Git commented 1 year ago

Finally, I get the zVector facility enabled.

Fantastic! I'm very proud of you! :)

Please check my updates at https://github.com/salva-rczero/hyperion-zvector.

I'll definitely do that the moment I get the opportunity. (I'm still busy --AS USUAL! -- with many other things right now)

p.s. Please be patient, I'm an old mainframer and Visual Studio, x64, Git... are all new things for me.

Understood. Both Visual Studio and Git (either one of them, let alone both!) can each be quite intimidating and confusing when you're not familiar with them. But try not to let either one distract you, and concentrate instead on the Hercules (instruction) code and your ASMA test code. Both of those should hopefully be more familiar to you, and are the most important part of your effort.

You're doing GREAT, salva! :-D

Please keep up the good work! I'm confident that much of your effort will eventually make it into Hercules itself. You should be proud of yourself!

I know I am. :)

salva-rczero commented 1 year ago

Thank you @Fish-Git.

Now I have four zvector instructions "working": VL, VLM, VST and VSTM:

https://github.com/salva-rczero/hyperion-zvector

And of course I have a zillion questions.

First, can I still ask you my questions, or should I spend more time reading the code and documentation?

If yes:

Is this thread the right place? Or should I do it somewhere else (e.g. groups.io/Hercules, email...)?
The compilation seems slow to me. Even modifying a small line, it recompiles almost the whole project. Is there something I'm doing wrong?
On the other hand, there are times when it is necessary to clean and do a complete re-completion (with so much macro, VS sometimes seems to become disoriented). Again, am I doing something wrong?
How do I get compiler output mapping of structs? (layout including type, size & offset), as Mainframe do for Assembler, Cobol, PL/1...

[x] I can build the Debug config, but can't run it. I receive a 0xC0150002 windows error. Point to a DLL error. Does it works for you?
Solved: download & rebuild extpkgs following "Building "External Packages""

Some vector opcodes admit an "alignment hint" to speed up load/store. Does Hercules need to consider it or it is only for real iron?
As you know VR0-16 overlaps (partially) with FPR. In addition, Hercules considers FPRs as 32x32b instead 16x64b. Pointing all these data to same contiguous storage location sounds impossible. What is the best approach?

Fish-Git commented 1 year ago

Now I have four zvector instructions "working": VL, VLM, VST, VSTM. Check at https://github.com/salva-rczero/hyperion-zvector.

Fantastic! I'm cloning and looking at your repository now.

And of course I have a zillion questions.

First, can I still ask you my questions, or should I spend more time reading the code and documentation?

If yes:

Is this thread the right place? Or should I do it somewhere else (e.g. groups.io/Hercules, email...)?

Of course you can still ask questions! It doesn't have to be to me either. I'm sure any of the other Hercules developers or Hercules users/enthusiasts would be happy to help you too!

Reading the Hercules source code is always a good idea. The more you understand how Hercules works, the more it will help you with your effort.

Is this thread the right place? It is not the best place, no. We should probably move it somewhere else, but I'm unsure where the best place should be. The main Hercules groups.io/Hercules group might be a good place, yes. But for now, here is fine.

2. The compilation seems slow to me. Even modifying a small line, it recompiles almost the whole project. Is there something I'm doing wrong?

Probably, yes. Precisely what you are doing wrong I don't know.

Be aware however, that Release (optimized) builds do normally take much longer than Debug (unoptimized) builds. This is expected and completely normal since Visual Studio is trying very hard to create the best, most efficient optimized code possible, and Hercules is quite large and complicated. So Visual Studio may take several minutes to complete a fully optimized Release build.

3. On the other hand, there are times when it is necessary to clean and do a complete re-completion (with so much macro, VS sometimes seems to become disoriented). Again, am I doing something wrong?

Probably not, no.

Hercules is not designed like most other fairly simple (but large) Windows programs. Hercules is written very differently from the way most other programs are written. Many of its source code members are designed to be compiled multiple times, each time with a different set of #defines active (different set of constants and different set of macros). As a result, yes, Visual Studio does frequently "get confused".

4. How do I get compiler output mapping of structs? (layout including type, size & offset), as Mainframe do for Assembler, Cobol, PL/1...

I don't understand this question. :(

5. I can build the Debug config, but can't run it. I receive a 0xC0150002 windows error. Point to a DLL error. Does it works for you?

I have not tried yet. But error 0xC0150002 usually means you're missing the require Visual Studio runtime DLLs. I'm guessing you probably did not install Visual Studio correctly.

6. Some vector opcodes emit an "alignment hint" to speed up load/store. Does Hercules need to consider it or it is only for real iron?

Another question I don't quite understand. :(

Generally speaking however, yes, Hercules definitely needs to take host alignment requirements into consideration. Each host platform is different though. Some host platforms require certain operands to be aligned. Other host platforms simply run less efficiently if operands are not aligned. Such alignment requirements are handled by various fetch.store macros defined in our machdep.h header.

7. As you know VR0-16 overlaps (partially) with FPR. In addition, Hercules considers FPRs as 32x32b instead 16x64b. Pointing all these data to same contiguous storage location sounds impossible. What is the best approach?

This is a question which I do fully understand, but unfortunately do not have an answer for. :(

This is precisely one of the items that makes providing support for the z/Architecture Vector Facility so challenging. It's not just the overwhelming number of vector instructions introduced by the facility, but the complete redesign of our floating point register handling that will be required as a result that presents the greatest challenge. :(

I don't have an answer for this one. :(

s390guy commented 1 year ago

The file "CZAM_390-CM" has been uploaded to the groups.io hercules-390 file section. This document discussion in detail what is involved in both CZAM and 390-CM facilities.

Fish-Git commented 1 year ago

SWAP128 Test Program

I would appreciate feeback on the attached. Refer to the README file for details:

swap128.zip

It's a little test program I recently wote that tests a possible(?) Hercules implementation of z/Vector support.

It does not test any vector instructions. It only tests the proposed possible implementation of Hercules internal support for vector registers, i.e. how the registers in REGS should be defined, and how a new 128-bit variable type might be handled (i.e. how all the swapping/storing/fetching might be handled).

It builds and runs on both Windows AND Linux too, and comes with sample test run output.

Please read the README. It explains everything.

Any/all feedback greatly appreciated. Thanks.

Fish-Git commented 1 year ago

SWAP128 version 2

Here's the updated .zip file:

swap128.zip

Changes: BIG-ENDIAN support.

Built and tested on LinuxONE Ubuntu 22.04. Works fine.

I'm still quite concerned about my overall proposed internal support however. My program does prove we can indeed provide support for it with a minimal number of changes, but it only does so for Intel "x86" CPUs right now (and now, s390x systems too, obviously).

Providing the same support for the many other host CPU architectures out there that we currently support that don't support Intel SIMD instructions however, is the hard part. :(

THAT I have not figured out how to do yet, and is what I very much need help with. :(

I'm somewhat anxious to integrate my changes into Hercules as soon as possible if others feel it's okay to do so at this time (i.e. that doing so won't break existing Hercules), so that people like @salva-rczero and others can have a version of Hercules they can work with (i.e. one that provides the needed internal framework).

That way others can get started on coding the actual vector instructions themselves without worrying about "how it's all going to hang together". That is to say, once we have the framework in place, then the actual coding of the actual vector instructions themselves can proceed normally. But that can't really be done until we have the internal support framework in place first. THAT's what I'm trying to do here with my test program.

So I'd really appreciate any type of feedback from my fellow developers out there regarding whether I'm on the right track or not. I don't want to screw things up! So I need your help to prevent that from happening.

Thanks.

salva-rczero commented 1 year ago

Thanks @Fish-Git.

I have not been able to dedicate time to zVector, because I have a terrible toothache. I hope to be back soon.

From what I have been able to understand of your work, it would remain to adapt any reference to FPR to the new macros.

On my side, I already had about 40 vector instructions written. My approach was to keep VR always in BIGENDIAN, but I guess it won't be too difficult to change it.

Regards, salva.

Dorpstraat commented 1 year ago

Hi @salva-rczero , I tried to build your version of Hyperion on Linux, but I get the following messages. Does that tell you something. It's the result of the make:

/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_multiply_and_add_high'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_multiply_logical_high'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_galois_field_multiply_sum_and_accumulate'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_subtract_compute_borrow_indication'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_subtract_with_borrow_indication'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_shift_left'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_select'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_fp_test_data_class_immediate'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_multiply_and_add_logical_high'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_element_compare_logical'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_load_fp_integer'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_add'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_unpack_low'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_load_gr_from_vr_element'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_load_positive'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_permute_doubleword_immediate'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_unpack_logical_low'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_fp_compare_scalar'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_find_element_equal'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_fp_subtract'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_minimum'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_find_element_not_equal'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_scatter_element_32'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_store_element_16'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_maximum_logical'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_load_element_64'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_element_shift_right_arithmetic_vector'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_exclusive_or'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_shift_left_by_byte'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_fp_load_lengthened'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_pack_saturate'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_load_count_to_block_boundary'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_compare_equal'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_add_with_carry'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_gather_element_64'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_fp_convert_to_fixed_64_bit'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_fp_load_rounded'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_and_with_complement'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_isolate_string'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_string_range_compare'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_fp_compare_equal'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_load_element_16'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_average'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_minimum_logical'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_and'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_gather_element_32'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_maximum'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_element_shift_right_arithmetic'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_store_element_64'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_pack'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_add_compute_carry'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_load_vector'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_pack_logical_saturate'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_fp_multiply_and_add'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_galois_field_multiply_sum'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_multiply_and_add_even'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_load_with_length'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_load_element_8'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_shift_right_arithmetic_by_byte'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_load_element_immediate_64'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_fp_convert_from_fixed_64_bit'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_store'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_load_logical_element_and_zero'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_subtract_with_borrow_compute_borrow_indication'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_compare_high_logical'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_sum_across_word'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_multiply_even'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_checksum'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_multiply_and_add_odd'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_shift_right_arithmetic'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_load_vr_from_grs_disjoint'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_replicate_immediate'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_load_complement'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_sign_extend_to_doubleword'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_fp_convert_from_logical_64_bit'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_multiply_logical_odd'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_multiply_low'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_sum_across_doubleword'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_shift_right_logical_by_byte'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_unpack_logical_high'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_load'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_add_with_carry_compute_carry'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_load_multiple'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_multiply_odd'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_multiply_logical_even'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_fp_compare_and_signal_scalar'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_store_with_length'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_subtract'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_load_element_immediate_16'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_multiply_and_add_low'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_load_vr_element_from_gr'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_count_trailing_zeros'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_fp_divide'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_fp_perform_sign_operation'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_element_rotate_left_logical_vector'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_load_and_replicate'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_multiply_and_add_logical_odd'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_merge_high'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_population_count'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_generate_mask'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_element_shift_left'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_store_element_8'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_count_leading_zeros'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_multiply_and_add_logical_even'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_element_rotate_and_insert_under_mask'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_fp_convert_to_logical_64_bit'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_scatter_element_64'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_load_element_immediate_8'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_element_shift_right_logical'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_average_logical'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_element_compare'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_shift_left_double_by_byte'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_unpack_high'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_sum_across_quadword'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_load_element_immediate_32'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_fp_square_root'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_nor'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_store_element_32'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_fp_compare_high'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_or'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_test_under_mask'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_element_shift_left_vector'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_permute'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_fp_add'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_fp_compare_high_or_equal'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_generate_byte_mask'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_element_shift_right_logical_vector'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_shift_right_logical'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_load_element_32'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_find_any_element_equal'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_fp_multiply_and_subtract'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_store_multiple'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_element_rotate_left_logical'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_merge_low'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_load_to_block_boundary'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_fp_multiply'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_replicate'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_compare_high'
/usr/bin/ld: ./.libs/libherc.so: undefined reference to `z900_vector_multiply_high'
collect2: error: ld returned 1 exit status
make[2]: *** [Makefile:2352: hercules] Error 1
make[2]: Leaving directory '/home/hercules/hyperion-zvector-develop'
make[1]: *** [Makefile:2661: all-recursive] Error 1
make[1]: Leaving directory '/home/hercules/hyperion-zvector-develop'
make: *** [Makefile:1987: all] Error 2

Regards, Loet

salva-rczero commented 1 year ago

Please try to add: zvector.c \ in Makefile.am at line 523.

Regards, salva.

Dorpstraat commented 1 year ago

Hi @salva-rczero

thanks for the quick response However...I get exactly the same errors from the make. Must confess I don't know much about developing on Linux. For example, how do I see whether zvector.c is actually included in the compilation

Regards, Loet

salva-rczero commented 1 year ago

Hi Loet,

I am developing in Visual Studio, so I only tested the MSVC build version. I don't know much about building in Linux, but the only new file I added to Hyperion if zvector.c. Please, try to clean your build branch (make -clean or make -a) and repeat the make.

Post the result if not ok.

salva.

Dorpstraat commented 1 year ago

Hi @salva,

I've read that the makefile.in is the real input to the installation process:

./configure
make
make install I have therefore modified makefile.in and added a similar line with 'zvector' where 'vector' is mentioned. I may have broken some development protocols with this. The make process no longer gave any error messages. Now continue testing what the result is

Thanks again Loet

Vf58 commented 7 months ago

just for information, I try to Ipl Z/OS 3.1 and receive wait state 000000000023007B https://www.ibm.com/docs/en/zos/3.1.0?topic=wsc-07b

I presume those 2 facilities are not enabled :

The vector binary-coded decimal facility The vector enhancements facility 1

Vincent

Dorpstraat commented 7 months ago

Hi Vincent

Thanks for this update. I still run on z/OS 2.5, but I'm warned

Thanks, Loet

Op do 4 apr 2024 om 14:36 schreef Vf58 @.***>:

Hi

just for information, I try to Ipl Z/OS 3.1 and receive wait state 000000000023007B https://www.ibm.com/docs/en/zos/3.1.0?topic=wsc-07b

I presume those 2 facilities are not enabled :

The vector binary-coded decimal facility The vector enhancements facility 1

Vincent

— Reply to this email directly, view it on GitHub https://github.com/SDL-Hercules-390/hyperion/issues/77#issuecomment-2037077205, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGI7IYNUJUBF3IJ7FQ334ZLY3VCMRAVCNFSM4EIWLNL2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBTG4YDONZSGA2Q . You are receiving this because you commented.Message ID: @.***>

Peter-J-Jansen commented 7 months ago

Hi Vincent,

As you wrote :

just for information, I try to Ipl Z/OS 3.1 and receive wait state 000000000023007B https://www.ibm.com/docs/en/zos/3.1.0?topic=wsc-07b

I presume those 2 facilities are not enabled :

The vector binary-coded decimal facility The vector enhancements facility 1

Have you already tried to bypass the WAIT 07B RAESON 23 by using the Hercules' facility command, e.g. :

facility enable 129_ZVECTOR
facility enable 134_ZVECTOR_PACK_DEC
facility enable 135_ZVECTOR_ENH_1

The first one of these is already needed for z/OS 2.5 to bypass the WAIT state, but is only partially used, like for (the default) z/OSMF. That default can be circumvented, although not with the MACHMIG VEF statement in the appropriate 'SYS1.IPLPARM(LOADxx)' member.

Fish-Git commented 7 months ago

facility enable 129_ZVECTOR
facility enable 134_ZVECTOR_PACK_DEC
facility enable 135_ZVECTOR_ENH_1

Wrong.

If a facility is not supported, you can't enable it. You'll get an error if you try to:

HHC00896E Facility( 054_EE_CMPSC ) not supported for z/Arch
HHC00007I Previous message from function 'facility_enable_disable' at facility.c(4554)
HHC01441E Config file[67] C:/Users/Fish/HercGUI/Configuration Files/CCKD64 zOS-3.1-ADCD-WARM-QETH.txt: error processing statement: FACILITY ENABLE 054_EE_CMPSC
HHC00007I Previous message from function 'process_config' at script.c(431)
HHC00896E Facility( 129_ZVECTOR ) not supported for z/Arch
HHC00007I Previous message from function 'facility_enable_disable' at facility.c(4554)
HHC01441E Config file[68] C:/Users/Fish/HercGUI/Configuration Files/CCKD64 zOS-3.1-ADCD-WARM-QETH.txt: error processing statement: FACILITY ENABLE 129_ZVECTOR
HHC00007I Previous message from function 'process_config' at script.c(431)
HHC00896E Facility( 130_INSTR_EXEC_PROT ) not supported for z/Arch
HHC00007I Previous message from function 'facility_enable_disable' at facility.c(4554)
HHC01441E Config file[69] C:/Users/Fish/HercGUI/Configuration Files/CCKD64 zOS-3.1-ADCD-WARM-QETH.txt: error processing statement: FACILITY ENABLE 130_INSTR_EXEC_PROT
HHC00007I Previous message from function 'process_config' at script.c(431)
HHC00896E Facility( 134_ZVECTOR_PACK_DEC ) not supported for z/Arch
HHC00007I Previous message from function 'facility_enable_disable' at facility.c(4554)
HHC01441E Config file[70] C:/Users/Fish/HercGUI/Configuration Files/CCKD64 zOS-3.1-ADCD-WARM-QETH.txt: error processing statement: FACILITY ENABLE 134_ZVECTOR_PACK_DEC
HHC00007I Previous message from function 'process_config' at script.c(431)
HHC00896E Facility( 135_ZVECTOR_ENH_1 ) not supported for z/Arch
HHC00007I Previous message from function 'facility_enable_disable' at facility.c(4554)
HHC01441E Config file[71] C:/Users/Fish/HercGUI/Configuration Files/CCKD64 zOS-3.1-ADCD-WARM-QETH.txt: error processing statement: FACILITY ENABLE 135_ZVECTOR_ENH_1
HHC00007I Previous message from function 'process_config' at script.c(431)

Instead, you need to forcibly set (forcibly enable) the corresponding facility bit number:

FACILITY  ENABLE  054       # 054_EE_CMPSC
FACILITY  ENABLE  129       # 129_ZVECTOR
FACILITY  ENABLE  130       # 130_INSTR_EXEC_PROT
FACILITY  ENABLE  134       # 134_ZVECTOR_PACK_DEC
FACILITY  ENABLE  135       # 135_ZVECTOR_ENH_1

HHC00898W Facility( 054_EE_CMPSC ) *Enabled for z/Arch
HHC00007I Previous message from function 'facility_enable_disable' at facility.c(4611)
HHC00898W Facility( 129_ZVECTOR ) *Enabled for z/Arch
HHC00007I Previous message from function 'facility_enable_disable' at facility.c(4611)
HHC00898W Facility( 130_INSTR_EXEC_PROT ) *Enabled for z/Arch
HHC00007I Previous message from function 'facility_enable_disable' at facility.c(4611)
HHC00898W Facility( 134_ZVECTOR_PACK_DEC ) *Enabled for z/Arch
HHC00007I Previous message from function 'facility_enable_disable' at facility.c(4611)
HHC00898W Facility( 135_ZVECTOR_ENH_1 ) *Enabled for z/Arch
HHC00007I Previous message from function 'facility_enable_disable' at facility.c(4611)

When you do that, you can get past the disabled wait, but then *MASTER* crashes during startup with:

 19.48.43   IEF773I TIOT SIZE = 0032K, MAXIMUM SINGLE UNIT DD ENTRIES = 00001635
 19.48.44   IEFJ200I MASTER SCHEDULER JCL FOR THIS IPL TAKEN FROM MEMBER MSTJCLSA OF PARMLIB
 19.48.44   IEF403I MSTJCLSA - STARTED - TIME=19.48.44
 19.48.44   IEA045I AN SVC DUMP HAS STARTED AT TIME=19.48.44 DATE=03/27/2024 FOR ASIDS(0001,0012) ERROR ID = SEQ00001 CPU00 ASID0001 TIME 19.48.44.7 QUIESCE = YES
*19.48.47  *IEE479W MASTER SCHEDULER ABEND 0A0, DUMPED, REIPL - CODE 1Z
 19.48.47   IEA794I SVC DUMP HAS CAPTURED: DUMPID=001 REQUESTED BY JOB (*MASTER*) DUMP TITLE=ERROR IN INITIATOR,ABEND=0A0,COMPON=INIT,COMPID=SC1B6,ISSUER=IEFIB620 INSUFFICIENT RESOURCES FOR OPTIMIZE=YES PROCESSING
*19.48.47  *01 IEA793A NO DUMP DATA SETS AVAILABLE FOR DUMPID=001 BY JOB (*MASTER*). USE THE DUMPDS COMMAND OR REPLY D TO DELETE THE DUMP

Which is more than likely caused by missing support for one or more of the above facilities.

I'm currently working on trying to get the Instruction-Execution-Protection Facility coded, and after that, the Entropy Encoding and Order Preserving Compression Facilities as well (which will take a while!), but it's really the z/Architecture Vector Facility that's killing us. Support for the Vector Facility is likely going to take years to code, as it requires redesigning/rewriting our entire floating point support.

I'm sorry, but it looks like z/OS 3.1 might very well be the end-of-the-line for Hercules. :(

Peter-J-Jansen commented 7 months ago

Hi Fish,

Thanks for the feedback. I didn't know there was a difference for the facility command between using bit numbers and the longer format also specifying the facility in text format. And yes, indeed, to IPL z/OS 2.5 I did use just the bit number, 129 in this case, which works.

So you answered my question, in that in this case such facility bypasses do not allow IPL-ing z/OS 3.1.

Cheers,

Peter

mcisho commented 7 months ago

Hi @salva-rczero,

I am about to start work on changes to use a shared area for zVector and Floating Point registers, so that updates to one register type updates the other register type. There is a lot of FP usage to be changed, so this is is going to be a slow process on my part, but my changes will impact the work you are doing on zVector instructions. Can we chat offline?

Cheers, Ian

salva-rczero commented 7 months ago

Hi @mcisho

My first approach was to use another area for VR and REFRESH/UPDATE from/to AFPR at every use. Then @Fish-Git proposed the shared area in POC SWAP128 (in this thread).

My development of instructions for zVector, involving bigendian storage, and the REFRESH/UPDATE mechanism. So I will absolutely have to change it.

I have not had much time to dedicate to it in these months and I don't have access to a real mainframe to test some complex instructions.

But I think Fish's approach is the right one and I encourage you to start the AFPR changes while I adapt the VR ones.

Regards, salva.

mcisho commented 7 months ago

Hi Salva,

From my limited knowledge of Vector, and reading your code, it appears to me that when the 128-bits are loaded from storage into a vector register, the vector register might contain one 128-bit vector, or two 64-bit vectors, or four 32-bit vectors, or eight 16-bit vectors, or sixteen 8-bit vectors, or maybe even a combination of different sized vectors. It's only when a zVector instruction subsequently manipulates the contents of the register that the size of a vector becomes apparent. Hence your vector code has to keep the register contents in the regs structure as a sequence of bytes (i.e. effectively big endian), and CSWAP the vector(s) to/from host endianness to suit the vector size required by the vector instruction. Is this your understanding too? Or am I missing something?

The FP registers contents in the regs structure are currently kept in the endianness of the host. If vector registers must be kept as big endian, then fp registers will also have to be kept as big endian. Which will have an impact on the design and usage of the shared area for vector/fp registers.

Cheers, Ian

p.s. Is you name Salva, am I addressing you correctly?

Fish-Git commented 7 months ago

* * * PLEASE NOTE: * * *

A new issue has been created for discussing development of the z/Architecture Vector Facility:

Issue #650: "Vector Facility for z/Architecture"

All discussion regarding development of the z/Architecture Vector Facility should take place in GitHub Issue #650, and NOT HERE,

GitHub Issue #77 (i.e. this issue) is a generic GitHub Issue regarding all yet-to-be-developed z/Architecture facilities, and not just the Vector Facility.

Please refrain from discussing z/Architecture Vector Facility development anywhere else, and discuss instead in the newly created GitHub Issue #650.

Thank you.