Closed oilcan-productions closed 9 months ago
We may to increase the timeout -- unless the compilation, for some reason, failed. Only way to tell would be to look in that lcf;comp log file. The most recent changes didn't involve any changes to the source to zork, nor the executables needed to compile it, so not sure what caused this timeout. It may be that your machine is slow?
@eswenson1 I was playing around with this for the last 10 days or so and found that no matter what timeout I set for a build it would fail with the timeout error with the exception of building EMULATOR=KLH-10. When the build gets to line 164 of the Makefile is where it hangs when booting from the RP0.dsk same happens when trying to load that manually after failing to launch. I changed the timeout from 100 to 1000 in build/build.tcl for the setup_timeout function, which seems to be the central location controlling that. You mentioned the lcf;comp log file. Where would I find that once the build fails?
looking at the lcf directory on the RP0 created before it failed I see two comp files
0 COMP PREAMB 1 ! 11/30/2016 03:56:40
0 COMP XXFILE 1 ! 11/30/2016 03:56:40
*:print comp preamb
<SNAME "LCF">$
<FLOAD "prim nbin">$
<FLOAD "defs nbin">$
<FLOAD "util nbin">$
<FLOAD "tell nbin">$
<FLOAD "makstr nbin">$
<FLOAD "typhak nbin">$
<OVERFLOW <>>$
*:print comp xxfile
^^R
:pcomp
<SNAME "LCF">$
^^J
<FILE-COMPILE "prim >">$
$p
^^J
<FILE-COMPILE "defs >">$
$p
^^J
<FILE-COMPILE "util >">$
$p
^^J
<FILE-COMPILE "makstr >">$
$p
^^J
<FILE-COMPILE "typhak >">$
$p
^^J
<QUIT>$
:assem "lcf;tell >" "lcf;tell nbin"
n
^^J
:pcomp
<FLOAD "lcf;comp preamb">$
^^J
<FILE-COMPILE "rooms >">$
$p
^^J
<FILE-COMPILE "parser >">$
$p
^^J
<QUIT>$
:pcomp
<FLOAD "lcf;comp preamb">$
^^J
<FILE-COMPILE "act1 >">$
$p
^^J
<FILE-COMPILE "act2 >">$
$p
^^J
<QUIT>$
:pcomp
<FLOAD "lcf;comp preamb">$
^^J
<FILE-COMPILE "act3 >">$
$p
^^J
<FILE-COMPILE "act4 >">$
$p
^^J
<QUIT>$
:pcomp
<FLOAD "lcf;comp preamb">$
^^J
<FILE-COMPILE "melee >">$
$p
^^J
<FILE-COMPILE "sr >">$
$p
^^J
<QUIT>$
^^Q
In build/zork.tcl, we have these commands:
respond "*" ":xxfile lcf;comp log_lcf;comp xxfile\r"
expect -timeout 6000 "Job XXFILE interrupted: .VALUE;"
type "\033p"
expect ":KILL"
So lcf;comp log is where XXFILE puts its "console output". When the build fails, it shouldn't delete the disk. So simply boot ITS using that disk, and look in the LCF directory. You should find the COMP LOG file there.
To boot that ITS, just go to the
For me, the only reliable emulator has been KLH10, so I use it (almost) exclusively. So I'd do:
cd build/klh10
sudo ./kn10-ks-its dskdmp.ini
go
its
$g
^Z
:listf lcf;
I have done that posted the content here https://github.com/PDP-10/its/issues/2172#issuecomment-1470915823
Had you gotten to the xxfile lcf;comp log_lcf;comp xxfile
line in the zork.tcl script? Because if you had, you should have found lcf;comp log created.
Yes the file is there. I missed it last time. I extended the time out to 12000 and it still times out. lcf;comp log is 0 bytes though
so what is in the file? did one of the commands die or hang?
There is nothing in the comp log file. It is zero bytes :(
That is odd. Can you boot the system using that disk and run that XXFILE command manually? You can replace the file name before the “_” in the XXFILE command line with TTY: so that the output of the XXFILE run goes to the terminal rather than the LCF;COMP LOG file.
*:xxfile TTY:_lcf;xxcomp xxfile
STYOPN ZZ 11 XXFILE 09:41:45
DB ITS.1651. DDT.1548.
TTY 11
2. Lusers, Fair Share = 101%
ZZ$0U
LOGIN ZZ0 11 09:41:46
To see system messages, do ":MSGS<CR>"
:GAG 0
*
:pcomp
You have about 8 cpu minutes to do your thing.
MUDDLE COMPILER NOW READY.
T
ERROR: Job changed when not allowed.
And then it sits there, not sure if it continues doing stuff. No more output for 10 minutes
Ah I had to set time and date for it to go through PDSET for the rescue
Ok that ran through. Took about 25 minutes
Cool. Yes, a lot of stuff doesn't work terribly well when the system doesn't know the time.
I have now extended the timeout to 35000 in zork.tcl. We will see if that helps
Extending the timeout did not solve the problem. The 2 things that seemed to help and get me past some of the issues
sudo raspi-config
make check-dirs all EMULATOR=<emulator spec>
and 3rd clone into separate folders for each emulator instead of building multiple from one folder
Documenting progress:
Job ECOMP wants the TTY
$p
DTTY; 716064>>.CALL 716712 (IOT) $P
"So you re-owned me, eh? I'm done anyway."
<FILE-COMPILE "tcheck >" "tcheck nbin">$
Input from DSK:.BATCH;TCHECK UJHM26
Output to DSK:.batch;TCHECK nbin
Bounds checking on.
Default declaration is UNSPECIAL.
Output in fast mode.
Temporary output going to: _TEMPLT >
Running disowned, with record on DSK:.BATCH;TCHECK RECORD
Toodle-oo.
:PROCED
The last command timed out. make: *** [Makefile:167: out/pdp10-ka/rp03.2] Error 1
Try this:
diff --git a/build/muddle.tcl b/build/muddle.tcl
index b41b1fee..65ddcf10 100644
--- a/build/muddle.tcl
+++ b/build/muddle.tcl
@@ -121,7 +121,8 @@ respond "T" "<SNAME \".batch\">\033"
respond "\".batch\"" "<FILE-COMPILE \"templt >\" \"templt nbin\">\033"
respond "Job ECOMP wants the TTY" "\033p"
respond "I'm done anyway." "<FILE-COMPILE \"tcheck >\" \"tcheck nbin\">\033"
-respond "Job ECOMP wants the TTY" "\033p"
+expect -timeout 600 "Job ECOMP wants the TTY"
+type "\033p"
respond "I'm done anyway." "<FILE-COMPILE \"taskm >\" \"taskm nbin\">\033"
expect -timeout 600 "Job ECOMP wants the TTY"
type "\033p"
Seems KLH10 is just faster than pdp10-ka. Compiles are taking longer.
@eswenson1 I believe when building EMULATOR=pdp10-kl
it is not using KLH10 it is using SIMH based KL emulator, only when using EMULATOR=klh10
it would use the KLH10 emulator, correct?
If I want to get to a point where all the machines are networked and can run on the same network at the same time I need to get the different machines built that are supported. Or am I reaching too high ?
That's correct. My point is that I tested all this on KLH10 on my machine. And the CI builds passed. But on your machine, it would appear that pdp10-ka is timing out. So increase the timeout for that compile (you might have to do it for several, or change the default value that the respond
TCL macro uses.
And no, you aren't reaching too high. I have builds of KLH10, pdp10-ka, pdp10-kl, and pdp10-ks ITS all running on a machine. They are all networked via chaosnet.
EMULATOR=klh10 is Ken Harrenstien's KLH10. EMULATOR=simh is Bob Supnik's KS10 in SIMH. All the others, pdp10-ka, pdp10-kl, and pdp10-kl, are Richard Cornell's SIMH based emulators.
@eswenson1 the pattern seems to work. I was able to build all SIMH based emulators now. I will submit a PR later today
@oilcan-productions That’s good news. Thanks. I wonder if we should do a rudimentary build machine speed test as part of our build and then adjust the default timeout based on the findings.
I am looking into at least checking the CPU architecture and disk type to see if we might run into issues. We could also make the timeout an environment variable and put it into user control.
So I went and made the changes locally for all emulators. PDP10-KA and PDP10-KL are building consistently with the change to move ^^R down one line in 'comp xxfile' and 'zork xxfile'
PDP10-KS fails, when I switch it to output to tty: instead of a log I get this
:KILL
*:link sys1;ts zork, sys; ts rbye
*:print cfs;..new. (udir)
DSK: CFS; ..NEW. (UDIR) - FILE NOT FOUND
:vk
*:xxfile tty:_lcf;comp xxfile
DB ITS.1651. DDT.1548.
TTY 11
2. Lusers, Fair Share = 133%
<SNAME ?U?
ERROR: ERROR string found.
I will keep tinkering with the format and see what I can find
I agree this is frustrating. I've even seen failures on KA and KL when the ^^R is down a line. But I agree, it mostly works. I've also had pdp10-ks work for me too. I'm wondering if it a timing error that results in the time it takes for XXFILE to handle the ^^R versus handle the first line of input to DDT -- and this is why it is not completely consistent.
Clearly, above, the <SNAME "LCF">
that is supposed to go to MDL, is going to DDT. That suggests that the DDT command line that invokes MDL is getting lost. It is clearly in the XXFILE, so what is happening.
HOWEVER, one thing I found is that XXFILE attempts to login, and the success or failure of the XXFILE run depends on whether or not, post login, you get the prompt for reading mail or anything other input the LOGIN might evoke. This shouldn't result in any difference during the builds, but when you are invoking XXFILE on your own account, I've noticed the LOGIN file can make it succeed or fail, depending on what it does.
Another tidbit. If I take zork.tcl out of the build for KS it times out compiling tcheck.
Job ECOMP wants the TTY
$p
DTTY; 716064>>.CALL 716712 (IOT) $P
"So you re-owned me, eh? I'm done anyway."
<FILE-COMPILE "tcheck >" "tcheck nbin">$
Input from DSK:.BATCH;TCHECK UJHM26
Output to DSK:.batch;TCHECK nbin
Bounds checking on.
Default declaration is UNSPECIAL.
Output in fast mode.
Temporary output going to: _TEMPLT >
Running disowned, with record on DSK:.BATCH;TCHECK RECORD
Toodle-oo.
:PROCED
*
The last command timed out.
make: *** [Makefile:164: out/pdp10-ks/rp0.dsk] Error 1
Finally figured out why simh build is timing out during executing xxfile zork xxfile
<FLOAD "parser nbin">$
FILE-SYSTEM-ERROR
#FALSE ("FILE NOT FOUND" "DSK:LCF;PARSER nbin" 1048576)
FLOAD
the file is missing in the source as far as I can tell
That file should have been created by the COMP XXFILE.
No matter how much I tweaked timeouts the build still kept failing. Then tried moving the build to SSD connected through USB 3.0 to the Raspberry Pi, no dice. The only thing that made the build work is switching out the SD card to a "brand" name SD card which has more space and faster R/W speeds.
I cloned the latest and ran
make EMULATOR=simh
on Raspian Bullseye. the build fails with the below error.