PDP-10 / klh10

Community maintained version of Kenneth L. Harrenstien's PDP-10 emulator.
Other
59 stars 7 forks source link

TOPS-20 hangs a short time after boot up #29

Closed jguillaumes closed 3 years ago

jguillaumes commented 6 years ago

Hello,

This report should be confirmed by someone else.

Apparently, commit ba556fbd9d4ffeff97d587c935c56a2187e771e0 has introduced unstability to the KL10 emulator. Now TOPS-20 freezes a few minutes after boot up. It also freezes almost immediately if a CHECKD is requested on boot.

I've tested it with the PANDA distribution. Going back to 9d13c2b5e55916c0ec69d3435fcb63f01fb70bab seems to "fix" the problem.

Possible diagnose: the increment in the number of devices causes a memory overwrite somewhere. I have not looked at the code yet.

jguillaumes commented 6 years ago

Additional info: I'm doing my testing on a Raspberry Pi 3 SoC running the last version of Raspbian.

larsbrinkhoff commented 6 years ago

Hello,

You write:

Possible diagnose: the increment in the number of devices causes a memory overwrite somewhere.

However, the commit you refer to, ba556fb, doesn't change the number of devices. Could you please check 2242d4f22 too?

jguillaumes commented 6 years ago

Yes, you are right. I pasted the wrong link. Anyway, I'm tempted to close the issue until I get more information. Right now I'm not getting consistent results from my tests.

jguillaumes commented 6 years ago

New hyphotesis: the -O3 optimization level does weird things and causes pseudo-random lockups. I've not observed any hang with -O2. But I'll do more testing.

Rhialto commented 6 years ago

It is certainly not impossible that higher optimization levels cause problems. Usually this is because there is some error in the code, which gets exposed due to more aggressive optimization. See commit f5ed23867fc4a4f90440883f156862f408f56cb0 for an example. As I recall, there was a patch doing the rounds blaming this on a bug in a newer gcc version, but more likely the original code violates the rule about multiple changes to one variable between sequence points (I think I traced the details at one point, but I cannot remember right now). So if you can pinpoint the problem to a single source file, or even more precisely than that, we might find something similar to fix.

jguillaumes commented 6 years ago

This is going to be hard to do. The lock-up doesn't happen every time, and when it happens it doesnt always happen at the same point. And it doesn't happen in all the architectures I can test into (for instance, I have not observed it on a single core ARM 6 device, namely Raspberry Pi model B). It usually happens if I ask the system to run CHECKD at bootup, but usually is not always. Unfortunately, I'm not knowledgeable about TOPS-20 to try to debug the guest OS itself to locate the choke point.

b4 commented 6 years ago

Oh, I have experienced this issue before.

I found a cause in like 2015-2016 and forgot to note it...

It seemed it was a race condition related to init of dpni20 and how the boot procedure overall occurred.

jguillaumes commented 3 years ago

TIme to close this, I guess :)