dolphinsmalltalk / Dolphin

Dolphin Smalltalk Core Image
MIT License
301 stars 58 forks source link

Quit from filed-in script can cause silent nil DNU #setToEnd error #1184

Closed daniels220 closed 8 months ago

daniels220 commented 1 year ago

Describe the bug A filed-in script (either from a menu or starting the image with -f) that ends by calling SessionManager current quit[:] may result in a brief flash of a walkback dialog, and logging to the .errors file, of an UndefinedObject does not understand #setToEnd error due to SourceManager default changesFiler having closed and nil'ed its underlying stream.

Specifically, this only occurs if the script ends with a chunk marker followed by at least one more character (including—indeed most likely—whitespace), e.g. SessionManager current quit!\r\n (a trailing newline). The error occurs trying to log an empty chunk, after evaluating the chunk containing the quit.

Worth noting also that this can only occur in a GUI session, because in a console session the primQuit: that actually exits the image is issued synchronously rather than in a postToInputQueue block.

Several things contribute to this, suggesting several possible fixes:

  1. ChunkSourceFiler>>fileIn does a skipSeparators after deciding whether another chunk may be present. Simply changing the start of the method to [stream skipSeparators; atEnd] whileFalse: [(stream peekFor: $!) ...etc... would eliminate this particular problem, and this seems like a reasonable thing to do anyway to avoid extraneous empty entries in the changes file.
  2. There is some inconsistency as to whether the SourceManager's subsidiary filers are simply closed and left in place, or removed from the SourceFiles array as well.
    • SourceManager>>onExit just closes them, while #closeSources removes/nils them as well.
    • #hasSources checks that the streams are actually present, but most other methods (all the #log... and #store... methods that write to the changes file, as well as #getSourceFromDescriptor:) only checks that the filter is not nil.
    • Removing/niling the files in onExit would cause any attempted logging this late in the shutdown process to be silently skipped. This I'm not sure about, since it really ought not to be necessary—I do think this particular logging attempt should be stopped earlier as per (1), and in general, closing the sources/changes files is the very last thing that happens before SessionManager>>onExit returns and the final primQuit: is queued, well past the point of "second childishness and mere oblivion" (I love those comments by the way).
  3. It is weird, at a very basic level, that quit[:] returns normally to its sender. If I were to write something like:

    "Almost like a ^return, but for a multi-chunk script file, and setting the exit code"
    someErrorCondition ifTrue: [SessionManager current quit: 1].
    self doSomeOtherStuff.

    I would certainly expect doSomeOtherStuff not to execute when the condition is true, but actually it would. Indeed the entire rest of the script, multiple chunks and all, would attempt to continue executing, though it would be brought up short on the next chunk by the DNU #setToEnd error, and the resulting forkMain from the walkback dialog would process the primQuit: just as it does when the following chunk is empty.

    I assume the reason the primQuit: is deferred using postToInputQueue: is to avoid executing it within a callback? (Are there any other conditions where we'd want to defer the quit?) That would make sense, for the same reason that we clean up various resources nicely even though Windows would ultimately clean up after us once the VM terminates. We could take a more nuanced approach, though. If we're not in a callback, just issue the primQuit: immediately. If we are, do the postToInputQueue, but then terminate (or kill?) the active process. (Interesting point of similarity with SessionManager>>onStartup:, which also ends with a kill, for a different but related reason.) Or, rather than postToInputQueue, we could just fork a process to do the primQuit:.

To Reproduce

  1. File in a script ending with SessionManager current quit!, including a trailing space, newline, etc.
  2. You may see a brief flash of walkback dialog, and in any case, note that the .errors file contains a stack dump like:
    
    2023-03-24T13:31:35.0988149-04:00: Unhandled exception - a MessageNotUnderstood('UndefinedObject does not understand #setToEnd')

Object>>doesNotUnderstand: ChunkSourceFiler(SourceFiler)>>setToEnd SourceManager>>changesFiler SourceManager>>logEvaluate: Compiler class>>evaluate:for:evaluationPools:logged:ifFail: Compiler class>>evaluate:for:evaluationPools:logged: Compiler class>>evaluate:for:logged: [] in ChunkSourceFiler>>fileIn ExceptionHandler(ExceptionHandlerAbstract)>>markAndTry [] in ExceptionHandler(ExceptionHandlerAbstract)>>try: BlockClosure>>ifCurtailed: BlockClosure>>ensure: ExceptionHandler(ExceptionHandlerAbstract)>>try: BlockClosure>>on:do: ChunkSourceFiler>>fileIn SourceManager>>fileInFrom:normalizeLineEndings: [] in SourceManager>>fileIn:normalizeLineEndings: BlockClosure>>ifCurtailed: BlockClosure>>ensure: SourceManager>>fileIn:normalizeLineEndings: [] in RefactoringSmalltalkSystem(SmalltalkSystem)>>fileFileIn BlockClosure>>ifCurtailed: BlockClosure>>ensure: Cursor>>showWhile: RefactoringSmalltalkSystem(SmalltalkSystem)>>fileFileIn Symbol>>forwardTo: CommandDescription>>performAgainst: [] in Command>>value BlockClosure>>ifCurtailed: BlockClosure>>ensure: Command>>value ShellView>>performCommand: SmalltalkSystemShell(Shell)>>performCommand: CommandQuery>>perform DelegatingCommandPolicy(CommandPolicy)>>route: [] in ShellView(View)>>onCommand: BlockClosure>>ifCurtailed: BlockClosure>>ensure: Cursor>>showWhile: ShellView(View)>>onCommand: ShellView(View)>>wmCommand:wParam:lParam: ShellView(View)>>dispatchMessage:wParam:lParam: [] in InputState>>wndProc:message:wParam:lParam:cookie: BlockClosure>>ifCurtailed: GuiInputState(InputState)>>wndProc:message:wParam:lParam:cookie: GuiInputState(InputState)>>pumpMessage: GuiInputState(InputState)>>loopWhile: GuiInputState(InputState)>>mainLoop [] in GuiInputState(InputState)>>forkMain ExceptionHandler(ExceptionHandlerAbstract)>>markAndTry [] in ExceptionHandler(ExceptionHandlerAbstract)>>try: BlockClosure>>ifCurtailed: BlockClosure>>ensure: ExceptionHandler(ExceptionHandlerAbstract)>>try: BlockClosure>>on:do: [] in BlockClosure>>newProcess:



**Expected behavior**
Image exist cleanly.

**Please complete the following information):**
 - OS: Windows 10
blairmcg commented 1 year ago

The quit being deferred is, as you surmise, to avoid it being processed inside a callback, although it's only actually necessary where there is an underlying user->kernel->user mode transition such as when dispatching a windows message. See 2647cf7a8d3fb53a2647444325beb6793506d255, #355, and dolphinsmalltalk/dolphinvm#128 for more explanation. It is a workaround hack and really needs a proper solution in the VM. Not deferring when not inside a callback would probably improve your scenario, but at the expense of introducing another confusing variability.

Forking the quit into a separate process won't work reliably because the forked processes are green threads and so all run on only one real OS thread, and therefore one machine stack. That means the callback for one process might still be active when another calls the quit primitive.

A proper fix for the VM is not immediately obvious so needs more thought than I've ever given it. You are right that for the normal case of Dolphin running in its own process, just exiting the process directly from the primitive should be fine as the OS will clean up all the resources anyway. However, a solution to exit cleanly without killing the process is still required when running in-proc as a DLL hosted in another process, e.g. in some COM scenarios.

Closing the source files fully in the way expected by the source manager would seem the right thing to do anyway, and would avoid the problem in your scenario. Whether or not the file-in should skip the trailing whitespace seems like a separate matter and only avoids the failure as a side effect.

A temporary workaround is, ironically, to post the quit to the input queue. This is effectively what the "RegressionTestsRun.st" script is doing,

Lastly, I was thinking it would be a good idea if there was a command line option to specify to exit after processing any file-ins, but there is one already. If you are able to add -x to the end of your command line, then you won't need to quit from the file-in.

daniels220 commented 1 year ago

Oooh, I remember seeing #355 and being sympathetically frustrated with the Windows change...the u->k->u exception throw seems like a potentially useful thing for programs in general, not just interpreters. And it looks like you're already aware that there are situations—mainly if the quit: is issued from outside the UI process, or from what was once a UI process but a new one has taken over the job—where there might still be trouble. (Re: my suggestion of doing the quit: in a forked process—that was only in the scenario where we are in a callback and escape from it by terminating the active process—then once we know we aren't, a forked process can handle it without spinning up the event-loop machinery again. "Terminating the active process" is of course not totally reliable either, as the callback could be from another process...)

Overall, yeah, seems like a thorny problem, and I certainly don't know enough about Win32 or the Dolphin VM to have any idea. Good luck, if/when you ever decide to tackle it.

Re: when to skip the trailing whitespace—I agree it only avoids this problem as a side-effect, but it seems equally clearly the right thing to do anyway. If you look at the method, it skips that whitespace immediately after checking atEnd—if only whitespace is left, really it will be atEnd by the time it actually does anything, so probably should have skipped it before checking.

So, seems like both that and the change to fully close the source files would make sense to make? I can make a PR in the next couple days I think.

Re: workarounds—yes, explicitly posting the whole quit to the input queue does work—I realize that I have one script that does this already, because it's doing a save and quit and wants to do so without the tail end of the script execution on the stack, so it not only posts the quit but does so as a MessageSequence rather than a block, to completely divorce it from its origins (a little bit of inspiration from the ImageStripper, I think?).

Of course, there's an even simpler workaround—just leave off the last chunk marker in the file! Posting the quit is probably more robust though.

I had forgotten about -x—that would work sometimes, but doesn't allow setting the exit code.