Open bsdphk opened 1 year ago
Hey that's great! I'd love having updates. Also, if you have any questions about the generated code, feel free to ask.
The structure of the compiler was driven extensively by the need to originally fit in a PDP-11 with 56KB of memory. The sources as you see them didn't support the PDP-11 (we forked that one off so we could work on improved optimizations and the like), but did support other environments with somewhat limited memory resources. Thus all of the overlays, the various ways of accessing semantic trees and symbol tables and the like. Looking it over brought back a lot of memories, mostly nightmares :). I'll take 56GB over 56KB any day.
On Thu, Mar 23, 2023 at 12:48 AM Poul-Henning Kamp @.***> wrote:
I just wanted to thank you a LOT for posting these sources!
I am working on a software emulation of the Rational R1000/s400 in datamuseum.dk, where all the programs on the 68K IO processor is compiled using this compiler.
Being able to study the internal logic of the compiler is a great aid to reverse-compiling those IO-programs.
Thanks a LOT!
— Reply to this email directly, view it on GitHub https://github.com/dhogaza/mc68000_pascal_2/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/APSGK35PC7WZZTB4XB4VGZDW5P54LANCNFSM6AAAAAAWE2KHWM . You are receiving this because you are subscribed to this thread.Message ID: @.***>
-- Don Baccus
If you want to follow along in my disassembly effort, or just want some example binaries compiled with your compiler:
I did some consulting work for Rational in the early 1990s, oddly enough. But I didn't even know of the Rational 1000 as I was never a fan of Ada, and our company had no interest in writing a compiler for the language (we did front ends for Modula 2 and C/C++ instead). I actually was working for a competitor of Rational, Verdix, whose Ada compiler software was used to implement the F-16 software.
I poked a bit at a couple of disassembly files but I think they were for the ADA machine, not the 68020 I/O machine, assuming the disassembler outputs reasonably standard M68K assembly.
On Mon, Mar 27, 2023 at 5:53 AM Poul-Henning Kamp @.***> wrote:
If you want to follow along in my disassembly effort, or just want some example binaries compiled with your compiler:
https://datamuseum.dk/aa//r1k_dfs/M200.html
— Reply to this email directly, view it on GitHub https://github.com/dhogaza/mc68000_pascal_2/issues/1#issuecomment-1485064224, or unsubscribe https://github.com/notifications/unsubscribe-auth/APSGK33BKFMNZZQYUKVDKBDW6GEVBANCNFSM6AAAAAAWE2KHWM . You are receiving this because you commented.Message ID: @.***>
-- Don Baccus
OK found some m68K disassembly ...
On Mon, Mar 27, 2023 at 12:34 PM Don Baccus @.***> wrote:
I did some consulting work for Rational in the early 1990s, oddly enough. But I didn't even know of the Rational 1000 as I was never a fan of Ada, and our company had no interest in writing a compiler for the language (we did front ends for Modula 2 and C/C++ instead). I actually was working for a competitor of Rational, Verdix, whose Ada compiler software was used to implement the F-16 software.
I poked a bit at a couple of disassembly files but I think they were for the ADA machine, not the 68020 I/O machine, assuming the disassembler outputs reasonably standard M68K assembly.
On Mon, Mar 27, 2023 at 5:53 AM Poul-Henning Kamp < @.***> wrote:
If you want to follow along in my disassembly effort, or just want some example binaries compiled with your compiler:
https://datamuseum.dk/aa//r1k_dfs/M200.html
— Reply to this email directly, view it on GitHub https://github.com/dhogaza/mc68000_pascal_2/issues/1#issuecomment-1485064224, or unsubscribe https://github.com/notifications/unsubscribe-auth/APSGK33BKFMNZZQYUKVDKBDW6GEVBANCNFSM6AAAAAAWE2KHWM . You are receiving this because you commented.Message ID: @.***>
-- Don Baccus
-- Don Baccus
It appears that the code has been compiled with at least array bounds checking enabled. Can't tell if range checks for variable assignments was enabled, usually the compiler could remove a good amount of array bounds checking if so, if variables used to access arrays used the same range declaration as the array. i.e. 'i:1..10" used to access an array declared "array [1..10]". Also the compiler computed ranges of variable values as best it could which could to some degree suppress both range and array bounds checking.
It's cool that the code was compiled with any checking left enabled at all. Most customers weren't willing to pay the space and time penalty to do so, despite the fact that the overhead was quite a bit lower than existing competitor compilers at the time.
On Mon, Mar 27, 2023 at 12:37 PM Don Baccus @.***> wrote:
OK found some m68K disassembly ...
On Mon, Mar 27, 2023 at 12:34 PM Don Baccus @.***> wrote:
I did some consulting work for Rational in the early 1990s, oddly enough. But I didn't even know of the Rational 1000 as I was never a fan of Ada, and our company had no interest in writing a compiler for the language (we did front ends for Modula 2 and C/C++ instead). I actually was working for a competitor of Rational, Verdix, whose Ada compiler software was used to implement the F-16 software.
I poked a bit at a couple of disassembly files but I think they were for the ADA machine, not the 68020 I/O machine, assuming the disassembler outputs reasonably standard M68K assembly.
On Mon, Mar 27, 2023 at 5:53 AM Poul-Henning Kamp < @.***> wrote:
If you want to follow along in my disassembly effort, or just want some example binaries compiled with your compiler:
https://datamuseum.dk/aa//r1k_dfs/M200.html
— Reply to this email directly, view it on GitHub https://github.com/dhogaza/mc68000_pascal_2/issues/1#issuecomment-1485064224, or unsubscribe https://github.com/notifications/unsubscribe-auth/APSGK33BKFMNZZQYUKVDKBDW6GEVBANCNFSM6AAAAAAWE2KHWM . You are receiving this because you commented.Message ID: @.***>
-- Don Baccus
-- Don Baccus
-- Don Baccus
Apologies for not giving more detailed directions.
All the files in the link above are 68K binaries, but the top part of each page is the output from my "un-pascal'er" code, and the unadultered assembly follows below that.
The runtime "library",also known as "FS" is here: https://datamuseum.dk/aa//r1k_dfs/44/442c504ed.html - also written (mostly) in PASCAL
That again runs on top of there "KERNEL" which is here: https://datamuseum.dk/aa//r1k_dfs/f1/f1bf0e801.html - hand written assembler.
The "un-pascal'er" code does not exploit the information in the limit-checks yet, I'm still trying to model the stack-content correctly, but once I get to value tracking, they will provide valuable information.
Next step is to classify FP-relative data according to use, hoping to build first-order function prototypes from that, and then use those prototypes to identify the actual arguments to the calls. (All the prototypes you see now are created manually)
With that in place it will be time for type-propagation via calls to local variables to calls to other functions and so on.
The thing which confuse me most about the compiled PASCAL code is string literals: First the compiler outputs instructions to copy the string literal from the code "segment" onto the stack, and then it calls a function (FS@0x10ddc) which copies it from there to dynamically allocated memory. Why the detour over the stack ? Is that an artifact of the Rational-adaptation of the runtime, or is there a deeper reason ?
PS: R1000 assembly looks like this: https://datamuseum.dk/aa//r1k_dfs/SEG.html and it is really weird: The R1000 instructions are Ada primitives. Also: The machine is bit-oriented so types needing 13 bits only allocate 13 bits.
PPS: Based on the similarity of the generated code, and the use of A5 as "origin" pointer, I hypothesize that Hewlett-Packard used your C compiler for the HP3458A and HP3245A products ?
I think Rational may've implemented their own string package but will look more closely. You can look at the runtime library sources here in the libsrc directory, and much is written in Pascal. The compiler supports standard pascal array-of-char strings and towards the end turbo-pascal strings were added (stored as length+data, which you'll see in the data section).
Some comments::
Compiler will allocate a small number of variables to D and A registers (and floating point variables to fp registers if you have the fp processor). They'll never be stored to memory.
Compiler is able to put for loop variables into registers in some cases if #1 hasn't done so.
Values computed in a registers may be stored on the stack if the register is clobbered before all uses have been executed. This generally happens with common subexpressions. The value will be retrieved from the stack as needed by other expressions/vars using the value. Hoisting can lead to additional CSEs and things like multidimensional array may only be partially hoisted so trying to reconstruct the source is not always going to be possible.
When a constant expression is folded and assigned to a var, in many cases the constant will be used rather than the variable.
After a procedure/function is compiled, the compiler does several peephole optimizations. If there are registers that haven't been used, the compiler will look for things to stuff into them, keeping in mind the cost of storing/restoring registers at proc entry/exit. This can include things like constants, variable addresses, etc. Variable addresses because the code generator in general doesn't know if there's a reference to the in-memory value (unlike those variables allocated by the optimizer to variables it knows aren't referenced outside the proc/function).
As you've seen, all parameters are passed on the stack.
On Tue, Mar 28, 2023 at 4:48 AM Poul-Henning Kamp @.***> wrote:
Apologies for not giving more detailed directions.
All the files in the link above are 68K binaries, but the top part of each page is the output from my "un-pascal'er" code, and the unadultered assembly follows below that.
The runtime "library",also known as "FS" is here: https://datamuseum.dk/aa//r1k_dfs/44/442c504ed.html - also written (mostly) in PASCAL
That again runs on top of there "KERNEL" which is here: https://datamuseum.dk/aa//r1k_dfs/f1/f1bf0e801.html - hand written assembler.
The "un-pascal'er" code does not exploit the information in the limit-checks yet, I'm still trying to model the stack-content correctly, but once I get to value tracking, they will provide valuable information.
Next step is to classify FP-relative data according to use, hoping to build first-order function prototypes from that, and then use those prototypes to identify the actual arguments to the calls. (All the prototypes you see now are created manually)
With that in place it will be time for type-propagation via calls to local variables to calls to other functions and so on.
The thing which confuse me most about the compiled PASCAL code is string literals: First the compiler outputs instructions to copy the string literal from the code "segment" onto the stack, and then it calls a function @.***) which copies it from there to dynamically allocated memory. Why the detour over the stack ? Is that an artifact of the Rational-adaptation of the runtime, or is there a deeper reason ?
PS: R1000 assembly looks like this: https://datamuseum.dk/aa//r1k_dfs/SEG.html and it is really weird: The R1000 instructions are Ada primitives. Also: The machine is bit-oriented so types needing 13 bits only allocate 13 bits.
PPS: Based on the similarity of the generated code, and the use of A5 as "origin" pointer, I hypothesize that Hewlett-Packard used your C compiler for the HP3458A and HP3245A products ?
— Reply to this email directly, view it on GitHub https://github.com/dhogaza/mc68000_pascal_2/issues/1#issuecomment-1486720716, or unsubscribe https://github.com/notifications/unsubscribe-auth/APSGK37L3TTGTELSWE74323W6LFZTANCNFSM6AAAAAAWE2KHWM . You are receiving this because you commented.Message ID: @.***>
-- Don Baccus
The string is copied to the stack because it is being passed as a value parameter. Pascal allows value parameters to be modified, so the compiler can't pass these by reference. Pascal doesn't allow one to pass a constant as a var parameter value either, of course. So the value is passed on the stack. Nowadays one would look to see if a value parameter is actually modified and allow pass-by-reference if it can be determined that it is not. But our compiler series was designed to run on machines with much less memory than we see today, so the compiler operated on one procedure/function at a time with some very, very limited information to be used when it is called, which did not extend to information on individual parameters. There's a lot of optimizing not done by our compilers due to memory constraints.
The compiler passes all strings by reference to the built-in library.
There is also a string package in the utility directory that similarly implements string operations but in standard pascal (so one can avoid the Turbo Pascal extension).
However in the disassembled code I see calls to things like "StringCat2" and "StringDup". These do not appear in either the standard runtime library (whose names begin with "p_" anyway), or the string package we provided. Therefore I conclude they wrote their own.
Regarding various optimizations making the code hard to reverse-engineer, so far as I've looked (not much) their code looks pretty simple, meaning limited opportunities for the compiler to optimize stuff away.
On Tue, Mar 28, 2023 at 10:23 AM Don Baccus @.***> wrote:
I think Rational may've implemented their own string package but will look more closely. You can look at the runtime library sources here in the libsrc directory, and much is written in Pascal. The compiler supports standard pascal array-of-char strings and towards the end turbo-pascal strings were added (stored as length+data, which you'll see in the data section).
Some comments::
Compiler will allocate a small number of variables to D and A registers (and floating point variables to fp registers if you have the fp processor). They'll never be stored to memory.
Compiler is able to put for loop variables into registers in some cases if #1 hasn't done so.
Values computed in a registers may be stored on the stack if the register is clobbered before all uses have been executed. This generally happens with common subexpressions. The value will be retrieved from the stack as needed by other expressions/vars using the value. Hoisting can lead to additional CSEs and things like multidimensional array may only be partially hoisted so trying to reconstruct the source is not always going to be possible.
When a constant expression is folded and assigned to a var, in many cases the constant will be used rather than the variable.
After a procedure/function is compiled, the compiler does several peephole optimizations. If there are registers that haven't been used, the compiler will look for things to stuff into them, keeping in mind the cost of storing/restoring registers at proc entry/exit. This can include things like constants, variable addresses, etc. Variable addresses because the code generator in general doesn't know if there's a reference to the in-memory value (unlike those variables allocated by the optimizer to variables it knows aren't referenced outside the proc/function).
As you've seen, all parameters are passed on the stack.
On Tue, Mar 28, 2023 at 4:48 AM Poul-Henning Kamp < @.***> wrote:
Apologies for not giving more detailed directions.
All the files in the link above are 68K binaries, but the top part of each page is the output from my "un-pascal'er" code, and the unadultered assembly follows below that.
The runtime "library",also known as "FS" is here: https://datamuseum.dk/aa//r1k_dfs/44/442c504ed.html - also written (mostly) in PASCAL
That again runs on top of there "KERNEL" which is here: https://datamuseum.dk/aa//r1k_dfs/f1/f1bf0e801.html - hand written assembler.
The "un-pascal'er" code does not exploit the information in the limit-checks yet, I'm still trying to model the stack-content correctly, but once I get to value tracking, they will provide valuable information.
Next step is to classify FP-relative data according to use, hoping to build first-order function prototypes from that, and then use those prototypes to identify the actual arguments to the calls. (All the prototypes you see now are created manually)
With that in place it will be time for type-propagation via calls to local variables to calls to other functions and so on.
The thing which confuse me most about the compiled PASCAL code is string literals: First the compiler outputs instructions to copy the string literal from the code "segment" onto the stack, and then it calls a function @.***) which copies it from there to dynamically allocated memory. Why the detour over the stack ? Is that an artifact of the Rational-adaptation of the runtime, or is there a deeper reason ?
PS: R1000 assembly looks like this: https://datamuseum.dk/aa//r1k_dfs/SEG.html and it is really weird: The R1000 instructions are Ada primitives. Also: The machine is bit-oriented so types needing 13 bits only allocate 13 bits.
PPS: Based on the similarity of the generated code, and the use of A5 as "origin" pointer, I hypothesize that Hewlett-Packard used your C compiler for the HP3458A and HP3245A products ?
— Reply to this email directly, view it on GitHub https://github.com/dhogaza/mc68000_pascal_2/issues/1#issuecomment-1486720716, or unsubscribe https://github.com/notifications/unsubscribe-auth/APSGK37L3TTGTELSWE74323W6LFZTANCNFSM6AAAAAAWE2KHWM . You are receiving this because you commented.Message ID: @.***>
-- Don Baccus
-- Don Baccus
Oh, and you probably have figured this out, but the compiler allows the assigning of variable to absolute memory locations, allowing memory-mapped I/O to be done directly in Pascal.
So on the PDP-11 something like
var device origin 1770b: record status: 0..65535; data: char end;
begin while (200b and device.status) = 0 do; device.data := 'a'; end.
Of course they might've done this low-level stuff in assembly.
One can also declare a procedure an interrupt procedure to handle interrupts.
On Tue, Mar 28, 2023 at 10:55 AM Don Baccus @.***> wrote:
The string is copied to the stack because it is being passed as a value parameter. Pascal allows value parameters to be modified, so the compiler can't pass these by reference. Pascal doesn't allow one to pass a constant as a var parameter value either, of course. So the value is passed on the stack. Nowadays one would look to see if a value parameter is actually modified and allow pass-by-reference if it can be determined that it is not. But our compiler series was designed to run on machines with much less memory than we see today, so the compiler operated on one procedure/function at a time with some very, very limited information to be used when it is called, which did not extend to information on individual parameters. There's a lot of optimizing not done by our compilers due to memory constraints.
The compiler passes all strings by reference to the built-in library.
There is also a string package in the utility directory that similarly implements string operations but in standard pascal (so one can avoid the Turbo Pascal extension).
However in the disassembled code I see calls to things like "StringCat2" and "StringDup". These do not appear in either the standard runtime library (whose names begin with "p_" anyway), or the string package we provided. Therefore I conclude they wrote their own.
Regarding various optimizations making the code hard to reverse-engineer, so far as I've looked (not much) their code looks pretty simple, meaning limited opportunities for the compiler to optimize stuff away.
On Tue, Mar 28, 2023 at 10:23 AM Don Baccus @.***> wrote:
I think Rational may've implemented their own string package but will look more closely. You can look at the runtime library sources here in the libsrc directory, and much is written in Pascal. The compiler supports standard pascal array-of-char strings and towards the end turbo-pascal strings were added (stored as length+data, which you'll see in the data section).
Some comments::
Compiler will allocate a small number of variables to D and A registers (and floating point variables to fp registers if you have the fp processor). They'll never be stored to memory.
Compiler is able to put for loop variables into registers in some cases if #1 hasn't done so.
Values computed in a registers may be stored on the stack if the register is clobbered before all uses have been executed. This generally happens with common subexpressions. The value will be retrieved from the stack as needed by other expressions/vars using the value. Hoisting can lead to additional CSEs and things like multidimensional array may only be partially hoisted so trying to reconstruct the source is not always going to be possible.
When a constant expression is folded and assigned to a var, in many cases the constant will be used rather than the variable.
After a procedure/function is compiled, the compiler does several peephole optimizations. If there are registers that haven't been used, the compiler will look for things to stuff into them, keeping in mind the cost of storing/restoring registers at proc entry/exit. This can include things like constants, variable addresses, etc. Variable addresses because the code generator in general doesn't know if there's a reference to the in-memory value (unlike those variables allocated by the optimizer to variables it knows aren't referenced outside the proc/function).
As you've seen, all parameters are passed on the stack.
On Tue, Mar 28, 2023 at 4:48 AM Poul-Henning Kamp < @.***> wrote:
Apologies for not giving more detailed directions.
All the files in the link above are 68K binaries, but the top part of each page is the output from my "un-pascal'er" code, and the unadultered assembly follows below that.
The runtime "library",also known as "FS" is here: https://datamuseum.dk/aa//r1k_dfs/44/442c504ed.html - also written (mostly) in PASCAL
That again runs on top of there "KERNEL" which is here: https://datamuseum.dk/aa//r1k_dfs/f1/f1bf0e801.html - hand written assembler.
The "un-pascal'er" code does not exploit the information in the limit-checks yet, I'm still trying to model the stack-content correctly, but once I get to value tracking, they will provide valuable information.
Next step is to classify FP-relative data according to use, hoping to build first-order function prototypes from that, and then use those prototypes to identify the actual arguments to the calls. (All the prototypes you see now are created manually)
With that in place it will be time for type-propagation via calls to local variables to calls to other functions and so on.
The thing which confuse me most about the compiled PASCAL code is string literals: First the compiler outputs instructions to copy the string literal from the code "segment" onto the stack, and then it calls a function @.***) which copies it from there to dynamically allocated memory. Why the detour over the stack ? Is that an artifact of the Rational-adaptation of the runtime, or is there a deeper reason ?
PS: R1000 assembly looks like this: https://datamuseum.dk/aa//r1k_dfs/SEG.html and it is really weird: The R1000 instructions are Ada primitives. Also: The machine is bit-oriented so types needing 13 bits only allocate 13 bits.
PPS: Based on the similarity of the generated code, and the use of A5 as "origin" pointer, I hypothesize that Hewlett-Packard used your C compiler for the HP3458A and HP3245A products ?
— Reply to this email directly, view it on GitHub https://github.com/dhogaza/mc68000_pascal_2/issues/1#issuecomment-1486720716, or unsubscribe https://github.com/notifications/unsubscribe-auth/APSGK37L3TTGTELSWE74323W6LFZTANCNFSM6AAAAAAWE2KHWM . You are receiving this because you commented.Message ID: @.***>
-- Don Baccus
-- Don Baccus
-- Don Baccus
Do not let the names I have used for various functions confuse you: They are my best guesses, many from before I found out that this was PASCAL code.
Ah, OK, that makes more sense. How did you find out it was Pascal code?
On Thu, Mar 30, 2023 at 9:52 AM Poul-Henning Kamp @.***> wrote:
Do not let the names I have used for various functions confuse you: They are my best guesses, many from before I found out that this was PASCAL code.
— Reply to this email directly, view it on GitHub https://github.com/dhogaza/mc68000_pascal_2/issues/1#issuecomment-1490623474, or unsubscribe https://github.com/notifications/unsubscribe-auth/APSGK34BBNWBUTEYB5JPFYTW6W26VANCNFSM6AAAAAAWE2KHWM . You are receiving this because you commented.Message ID: @.***>
-- Don Baccus
I found error messages on the form "PASCAL error #" :-)
Recently I got hold of Wayne Meretsky and he told me they used "Oregon Software Pascal-1 or Pascal-2 (I don't recall) running on RSX-11 on a PDP-11 development system." and a bit of searching brought me here :-)
Excellent! These are the only sources remaining of what was once a large compiler system.
Front-ends: Pascal, Modula-2, C/C++ Back-ends: PDP-11, M68K, NS32K, i386, VAX, SPARC, some obscure Honeywell mini that we did under contract. and a whole bunch of operating systems.
All designed and about 1/2 written by me.
So an old customer bought sources to the VAX/VMS->M68K cross compiler and library and someone who had worked there had an old 9-track VAX/VMS backup format tape of them and found me and sent them to me.
For which I'm extremely grateful because I thought all of that work from my past had disappeared forever.
Glad you're finding the sources useful.
On Thu, Mar 30, 2023 at 2:17 PM Poul-Henning Kamp @.***> wrote:
I found error messages on the form "PASCAL error #" :-)
Recently I got hold of Wayne Meretsky and he told me they used "Oregon Software Pascal-1 or Pascal-2 (I don't recall) running on RSX-11 on a PDP-11 development system." and a bit of searching brought me here :-)
— Reply to this email directly, view it on GitHub https://github.com/dhogaza/mc68000_pascal_2/issues/1#issuecomment-1490974345, or unsubscribe https://github.com/notifications/unsubscribe-auth/APSGK32RY2CMEFVEN4R43BDW6XZ5VANCNFSM6AAAAAAWE2KHWM . You are receiving this because you commented.Message ID: @.***>
-- Don Baccus
I just wanted to thank you a LOT for posting these sources!
I am working on a software emulation of the Rational R1000/s400 in datamuseum.dk, where all the programs on the 68K IO processor is compiled using this compiler.
Being able to study the internal logic of the compiler is a great aid to reverse-compiling those IO-programs.
Thanks a LOT!