KSP-KOS / KOS

Fully programmable autopilot mod for KSP. Originally By Nivekk
Other
693 stars 229 forks source link

[Dev build] Terminal sometimes unresponsive #1694

Open dewiniaid opened 8 years ago

dewiniaid commented 8 years ago

I'm running a self-compiled build of KOS (since I'm writing an addon that needs functionality not yet released), so I thought I'd mention a few things. I have not encountered any the issues here with 0.20.1:

In a stripped down KOS install with most stock parts removed; installed mods are EEX, KER, KRPC, MJ2, PreciseManeuver, Quickstart (all of which work with KOS 0.20.1 in my 'normal' save):

  1. While running bb403471044a1e84497f19ce45fb54299b4a9445 (current latest), I can connect over telnet, select a craft, and get to the "Proceed." line but no input is accepted (no visible affect). This also affected the in-game terminal. Electricity was normal and there is no RT2 in this install for it to be a loss of connection. This was awhile ago; it's possible the craft was launched while I was still on 0.20.1 (before I realized that the functionality I found from perusing your source code wasn't in 0.20.1)
  2. While running abb18c393464986c0e94ce3777048fee1bf74ea3 (earliest mainline code I can find with the new External API code), I've sporadically had the same issues. Sometimes they are resolved by launching a different craft (instead of the craft that was focused), sometimes not. They were never resolved by returning to the space center and switching back to the original craft.

In most if not all of these cases (but not in the case of #1), the terminal was displaying a nag message due to addon failures (usually but not always mine) at the time, but the nag message did not always stop input.

In my current invocation of KSP, everything seems to be working fine -- and I've switched to the original vessel in use during 1) and it is working fine as well.

A few notes:

  1. Input isn't completely ignored; Ctrl+C still outputs a message to the output_log.
  2. With DEBUGEACHOPCODE enabled, the log is flooded with with
kOS: Executing Opcode 0019/0020  EOF
Prior to exeucting, stack looks like this:
Stack dump: stackPointer = -1

regardless of whether I'm currently experiencing the issue, but

  1. Interestingly enough, after rebooting the processor (on a working run where I could send commands), it changes to:
kOS: Executing Opcode 0005/0006  EOF
Prior to exeucting, stack looks like this:
Stack dump: stackPointer = 0
000 SP-> 0 (type: Scalar)

(The opcode numbers are correct; they increment every Ctrl+C)

hvacengi commented 8 years ago

If you're getting a nag message, it is by design that this shuts down the processor. When I set up the "addon" code to add nag messages, I did not expect to have so many instance of mod dlls failing to load as occurred with the release of 1.1.3. I've debated about removing the nag message, but I was concerned that it would prevent addon authors from seeing conflicts.

I would very much be interested to know what other mods were getting these nag messages, because I either need to remove the nag messages or find a way to better filter outdated mods. You may try commenting out lines 134 and 150 in src/kOS.Safe/Utilities/AssemblyWalkAttribute.cs to just remove the nag, and see how that helps.

There has always been some elusive bug where locking out terminal input doesn't work right, and I've never had it consistent enough to test or prove. I suspect there is some random circumstance where kOS experiences a race condition and locks out input before unlocking it again. We would need copies of log files to troubleshoot on what you're seeing and maybe find the input bug.

When sitting on the terminal, kOS simply executes over and over again the EOF opcode until you type something for it to execute. It's possible that we could ignore this when logging each opcode, but honestly that's a debugging-only feature anyways, since it will heavily lag the game.

dewiniaid commented 8 years ago

Ahh. If the lockout is intended to be in effect, then it was the occasional failure to lockout the terminal that was the problem, not the lockout. For the times I was locked out with no nag message, I wonder if it was due to the ship being locked out previously (prior to a restart) or a bug making the nag not show.

The EOF spam is understandable; the difference in stack pointer might be noteworthy though.

The nag messages are sometimes caused by Mono.Cecil (used for a C# REPL interface, not a true mod that KSP can load) and the kIPC mod I'm developing so likely not your concern -- though the latter is heavily an extension of kOS (though worthy of a discussion at some point).

dewiniaid commented 8 years ago

I'd suggest keeping the nag messages, but either not locking out the terminal or explaining that the terminal is locked out as a result. The information is good to have, but without expecting the lockout you might get more issues like this one -- or people thinking that something is broken between RT and kOS.

dewiniaid commented 8 years ago

Okay, it doesn't appear to be related to the nag messages at all. On Monday, I finally got my own addon to serialize data from kOS (more or less, its my own JSON/BSON format using JsonFx as an underlying library). I accidentally left KSP running overnight and came back to it having crashed (which is typical when it runs overnight -- it doesn't like sleep much).

Yesterday I thought of an edge case involving the serializer and restarted KSP with no changes to my environment. Once again, the craft was uncontrollable and going to Space Center and back did not fix it, but I was able to control a freshly prepared vessel (click launchpad -> choose recover -> launch auto-saved ship).

Next time it happens I'll grab full logs and persistence files and see if I can work out an exact repro. Until then, here's some additional info and speculation:

dewiniaid commented 8 years ago

Finally got a proper repro, testing against 517510c62b5bc6433ed4ee4a3a4785c9cd4aaafd

One of the two "Test Probe"s in the attached save has an unresponsive terminal when loading from persistent.sfs. The other two vessels seem to work fine, as does a freshly launched vessel. Both the save and output_log.txt are attached.

The affected vessel was the active vessel during a load that had a nag message.

I just realized upon closer examination that the "activated" flag was false, which I missed earlier -- so it looks like toggling power resolved the issue. That said, I don't recall ever toggling it off... so I did some testing.

It seems that when kOS produces a nag message, it deactivates the processor -- which then gets saved to the persistence file. So, even after fixing the nag it remains deactivated, and it's not always obvious that's the case -- there's no indication other than the kOS applauncher list that the CPU has no power (i.e. neither the in-game terminal nor telnet indicate this case).

Maybe the power being toggled should print a message to the terminal?

kOSBrokenTerminal.zip

hvacengi commented 7 years ago

@dewiniaid I think I may have fixed this issue by shifting the order that things are done when booting. If you test using the latest develop build on 1.2, do you still see this issue?

Dunbaratu commented 7 years ago

How long should we leave an issue like this open before closing it for inaction?

(i.e. if the last message was from the devs to the issue submitter, asking for more information... and that conversation ended there and hasn't had any further replies for months... do we want a statute of limitations where to get action on the issue later they'd have to open a new one?)

Dunbaratu commented 7 years ago

I'm moving this out of v1.1.0 since it's pending info from the user.