airsdk / Adobe-Runtime-Support

Report, track and discuss issues in Adobe AIR. Monitored by Adobe - and HARMAN - and maintained by the AIR community.
206 stars 11 forks source link

[AIR 50.2.1.1] `Worker::terminate()` cause crash with ADL #2523

Open itlancer opened 1 year ago

itlancer commented 1 year ago

Problem Description

Worker::terminate() cause crash with ADL launch for AIR 50.2.1.1.

Tested with latest AIR 50.2.1.1 with multiple different Windows devices with different applications using VScode and Animate. Same problem in all cases. There is no such issue with AIR 50.1.1.2 and below. Also there is no such issue without ADL launch (for application bundle all fine).

Related issue (not the same): https://github.com/airsdk/Adobe-Runtime-Support/issues/2385

Steps to Reproduce

Launch application with code below using IDE and click anywhere on stage. Launched Worker should be terminated.

Application example with sources attached. worker_terminate_crash.zip

package {
    import flash.display.Sprite;
    import flash.events.Event;
    import flash.system.Worker;
    import flash.system.WorkerDomain;
    import flash.events.MouseEvent;

    public class WorkerTerminateCrash extends Sprite {
        var worker:Worker;

        public function WorkerTerminateCrash() {
            if (Worker.current.isPrimordial) {
                addEventListener(Event.ADDED_TO_STAGE, init);
            }
        }

        private function init(e:Event):void {
            removeEventListener(Event.ADDED_TO_STAGE, init);
            worker = WorkerDomain.current.createWorker(this.loaderInfo.bytes, true);
            worker.start();

            stage.addEventListener(MouseEvent.CLICK, click);
        }

        private function click(e:MouseEvent):void {
            trace("click");
            worker.terminate();//This line cause crash
        }
    }
}

Actual Result: Application crash.

Expected Result: Worker will be terminated without crash.

Known Workarounds

none Use AIR 50.1.1.2.

ajwfrost commented 1 year ago

Hmm.... so when I try this direct via ADL on a command line, it works fine. When I try it in Animate, it doesn't crash .. but it also doesn't seem to launch the worker. ... ah, but when I update FDB again, to fix that worker launch problem, then I can reproduce it from Animate. Looks like Animate updated itself and reset my earlier FDB change :-(

Will check on this...

thanks

ajwfrost commented 1 year ago

Hi again

Is this just happening with Animate? I can't reproduce when launching from ADL in a command line, or when running from VS Code.

There is a request to terminate the runtime that's being received from the debugger i.e. Animate is receiving the notification that the worker has finished, and is responding by telling the application to exit. So it's not actually crashing, it's shutting down the application.

I'm not sure if there's any workaround we can do here: it's not happening within a call stack that also has the worker exiting, so there's no definitive way to confirm that this request for exit isn't a valid one.. So I could perhaps just add this to the list of issues we're asking the Animate team to look at?

About the only change that may help perhaps is if we make the exit here a cancelable one? i.e. change this for all situations, so that the runtime would fire off an 'exiting' event to the NativeApplication object, and you can then cancel this from the app side if you want. The debugger connection would drop though, so I'm not sure how useful this would be in the long run.. I had thought we could just prevent the 'worker terminated' event from going to the debugger, but this didn't actually help, and I can't see anything that's going from the runtime to the debugger to tell it to exit.

itlancer commented 1 year ago

@ajwfrost

Sorry, with Animate, VSCode and ADL (from console) it happens only with patched SDK file (with 4K cameras support): SDK/runtimes/air/win/Adobe AIR/Versions/1.0/Adobe AIR.dll ADL console command: adl.exe -profile extendedDesktop worker_terminate_crash-app.xml

With release version of AIR 50.2.1.1 issue happens indeed only with Animate. But with AIR 50.1.1.2 no such issue at all.

Also I can see in Animate traces with AIR 50.2.1.1:

[SWF] worker_terminate_crash.swf - 1059 bytes
[SWF] worker_terminate_crash.swf - 1059 bytes

But with AIR 50.1.1.2 only one:

[SWF] worker_terminate_crash.swf - 1059 bytes
ajwfrost commented 1 year ago

So this part:

But with AIR 50.1.1.2 no such issue at all. Also I can see in Animate traces with AIR 50.2.1.1 (2 lines) But with AIR 50.1.1.2 only one:

That feels like it's the wrong way round..? Although maybe it's also dependent upon FDB.

We had an earlier issue where workers didn't start/run properly when launched from Animate, (#399) and then put in a fix that went into 50.0.1.2: https://github.com/airsdk/Adobe-Runtime-Support/issues/399#issuecomment-1287785568 But, this caused side-effects: the actual solution should have been to fix FDB (or better would have been to fix Animate) which was then done, hence the FDB update in https://github.com/airsdk/Adobe-Runtime-Support/issues/399#issuecomment-1376381236 and the 399 change being reverted in 50.2.1.1.

So in 50.1.x it had the change in it to keep workers running when they are kicked off with a debugger connection, which should mean you see both traces (although perhaps there's a separate issue with this!). In 50.2 the change has gone which means you only see one line, unless you've got the updated FDB (per my initial response above where I couldn't reproduce this because I didn't have the updated FDB...)

When you're trying with ADL from the command line, are you also running FDB in a separate console? And if so, what does that output as the ADL process shuts down?

Just thinking, I should probably run this in a Release build here - currently I'm using a Debug build which means we also get debugger connections set up for all the built-in AS3 bootstrap code, that may be complicating things!

thanks

itlancer commented 1 year ago

@ajwfrost I'm using Animate with patched FDB from here https://github.com/airsdk/Adobe-Runtime-Support/issues/399#issuecomment-1376381236

When you're trying with ADL from the command line, are you also running FDB in a separate console? And if so, what does that output as the ADL process shuts down?

With released 50.2.1.1 (without patched dll) there is no issue with ADL. What do you mean "FDB in a separate console"? I didn't launch FDB directly by myself.

ajwfrost commented 1 year ago

I'm using Animate with patched FDB from here

Perfect, and that then matches what I'd expect to see from 50.2...

What do you mean "FDB in a separate console"? I didn't launch FDB directly by myself.

When we try this with our latest codebase using Animate, the request to shut down the player comes from FDB that Animate has kicked off. So if we're then trying to test this from ADL on the command line (rather than from Animate) we need to manually launch FDB using its command-line interface. Animate starts FDB going (using the direct java interfaces rather than the command-line) and then launches the app, and its interaction with FDB are what are causing all these issues in the first place (at least, from what we've seen of them). But if you're able to have a computer without any debugger running on it at all, and just launch your test app with ADL from a command prompt window, and click on the stage and have it close -> that implies something different is going on for you!

I'm wondering whether that patched runtime with the updated camera sizes may have been built on an older codebase, what do the file properties say for it?

When I try the 50.2.1.1 release, with ADL from a command line and FDB run up separately, I get:

Adobe fdb (Flash Player Debugger) [build development]
Copyright (c) 2004-2007 Adobe, Inc. All rights reserved.
(fdb) run
Waiting for Player to connect
Player connected; session starting.
Set breakpoints and then type 'continue' to resume the session.
[SWF] worker_terminate_crash.swf - 1,090 bytes after decompression
(fdb) c
[trace] WIN 50,2,1,1
[WorkerCreate] 1
Additional ActionScript code has been loaded from a SWF or a frame.
To see all currently loaded files, type 'info files'.
Active worker has changed to worker 1

Set additional breakpoints as desired, and then type 'continue'.
(fdb) c
[SWF] worker_terminate_crash.swf - 1,090 bytes after decompression
[trace] WIN 50,2,1,1
[trace] click
[UnloadSWF] worker_terminate_crash.swf
[WorkerDestroy] 1

But the app stays, running; then when I close it:

[UnloadSWF] worker_terminate_crash.swf
Player session terminated

So that "Active worker has changed to worker 1" and "Set additional breakpoints as desired, and then type 'continue'" is where the earlier problem came: Animate wasn't handling that breakpoint request so the worker thread was just hanging waiting for the 'continue'.

What I believe is happening currently is that Animate receives the first "UnloadSWF" message and then sends back a request to terminate the whole application. Actually, I don't see any "UnloadSWF" messages, so this is possibly the culprit -> whatever FDB is doing to detect this unload, it's passing it on to Animate but Animate interprets that as the final "unload" rather than realising it's related to a Worker.

We can see then, whether this is something we can work around with a bit more hacking of FDB, but I'm also hoping to get some changes made in Animate here.

thanks

itlancer commented 1 year ago

@ajwfrost Patched dll marked as 50.2.1.1 version with file properties. With 50.2.1.1 release with FDB and ADL launch I can observe the same logs as you described. And no application closing.

With patched dll above 50.2.1.1 release with FDB and ADL launch I can observe the same logs as you described plus Player session terminated And application will be closed. But I think it crash cause in Windows Event Viewer I can see:

Faulting application name: adl.exe, version: 50.2.1.1, time stamp: 0x63ee5db7
Faulting module name: Adobe AIR.dll, version: 50.2.1.1, time stamp: 0x63f70715
Exception code: 0xc0000005
Fault offset: 0x0016c442
Faulting process id: 0x8ed8
Faulting application start time: 0x01d9565ce83bee0b
Faulting application path: c:\SDK\AIR\50\bin\adl.exe
Faulting module path: c:\SDK\AIR\50\runtimes\air\win\Adobe AIR\Versions\1.0\Adobe AIR.dll
Report Id: 5ab1384d-59ec-4815-8ec7-d02d177d1d39
Faulting package full name: 
Faulting package-relative application ID: 

So: 1) Patched dll seems cause application crash/exit in such case. I think its not a problem at all if release versions don't have such issues. 2) With 50.2.1.1 release for such case using Animate application will be closed. No such issue with other IDEs or ADL launch.

ajwfrost commented 1 year ago

Great, thanks for the extra details there. Sounds like there are two things here then: 1) an actual crash, per that event log .. but not one that was happening in the original release and something that I think may be fixed now (there was a crash-on-exit that we found we'd introduced with a recent feature change which I suspect is the culprit there, of course you'd only notice the crash if you're not actually exiting the whole application i.e. your use case here) 2) an issue with Animate not properly handling the worker swf closing -> we can see if we can work around this in FDB.

thanks

Adolio commented 3 months ago

I have a similar issue at my end too (on Windows but without Animate involved and it doesn't seem to only happen through ADL).

I cannot manually terminate workers due to this crash. Therefore workers get accumulated in the background (I can see them running in the background in VSCode) - However I don't know if this issue has a significant negative impact in the long run.

Here is a sample project that reproduces the issue: https://github.com/Adolio/Adobe-AIR-Worker-Crash

Adolio commented 1 week ago

@ajwfrost - a small ping to attract your attention to my last comment. The provided sample can easily reproduce the issue. Thanks in advance.

ajwfrost commented 1 week ago

Hi @Adolio - thanks for the reminder. We can reproduce this using your test project, when it's run via Visual Studio Code. But when we directly run it from ADL, it works fine.

According to the event log in Windows, this isn't actually a crash .. it just looks like a "normal" (albeit code-driven) shut-down.

We added a delay to the creation of the worker - but that then stopped it from being reproducible.

But then we added a delay to the completion of the worker - and this still meant that the window disappears, just after a bit longer, which gives us time to check stuff (and attach a debugger).

Observations: 1) when we have a delay just to the worker creation, then the worker itself completes immediately it's started.. but the "call stack" pane in VS Code remains empty the whole time, despite the fact there's an AIR window running 2) when we have the worker creation immediately but add a timeout before it completes, we get two entries in the "call stack" pane (for primordial and worker); and they disappear as the app closes 3) when we do both steps after a timeout, then the "call stack" pane starts off empty; when the worker is started, then both entries appear at that point; and when the worker completes, they disappear.

Oddly - when we put a debug build of the runtime in here to try to see what's happening, it doesn't reproduce (i.e. with option 3 there, we still see the failure when using a release binary, but not with debug) - although the two entries disappear from "call stack", the window remains open.

So I think this is likely to be some interoperability thing between the debugger - or whatever component is trying to track and update the VSCode output panes - and the runtime. I'll try to get a bit more information about the debugger traffic before adding more details back into the above-referenced BowlerHat ticket..

thanks

ajwfrost commented 6 days ago

So .. interestingly, as mentioned above, we are seeing different behaviour when we use a 'release' build vs a 'debug' build of the runtime. But we did manage to debug using a release build and found that the debugger was sending an "exit" request telling the runtime to terminate.

I think the first thing we need to do next is to check why that was only received in a release build and not in a debug build .. so e.g. tracing out all the debugger socket traffic to see what's the difference and whether this is meant to be happening..

ajwfrost commented 4 days ago

Okay I can't see any difference in the relevant messages to/from the debugger, the only changes are that in our debug build of the player, we have a lot more scripts and debug info available. But the actual user-space execution appears to be the same. For some reason the debugger is requesting the player to exit...

Below are the data transfers when the worker has finished and sends the event. The "In..." operations are from the Flash Player in to the debugger, which is also where the data has >> at the start (and subsequent lines would also be 'in' until you see << which is where the debugger is sending data back to the Flash Player). Isolate 2 (the worker) is active at the start of this.

>>  04 00 00 00 3e 00 00 00 01 00 00 00 
InIsolate 1
    19 00 00 00 05 00 00 00 53 65 72 69 61 6c 69 7a 61 74 69 6f 6e 20 63 6f 6d 70 6c 65 74 65 64 2e 00 
InTrace: Serialization completed.
    01 00 00 00 05 00 00 00 00 
InTrace
    08 00 00 00 05 00 00 00 52 65 73 75 6c 74 3a 00 
InTrace: Result:
    01 00 00 00 05 00 00 00 00 
InTrace
    02 00 00 00 05 00 00 00 7b 00 
InTrace: {
    1d 00 00 00 05 00 00 00 20 20 20 20 22 66 69 65 6c 64 41 22 3a 20 22 48 65 6c 6c 6f 20 77 6f 72 6c 64 22 2c 00 
InTrace:     "fieldA": "Hello world",
    12 00 00 00 05 00 00 00 20 20 20 20 22 66 69 65 6c 64 42 22 3a 20 34 32 2c 00 
InTrace:     "fieldB": 42,
    40 00 00 00 05 00 00 00 20 20 20 20 22 66 69 65 6c 64 43 22 3a 20 22 41 20 70 72 65 74 74 79 20 6c 6f 6e 67 20 73 74 72 69 6e 67 20 74 68 61 74 20 73 68 6f 75 6c 64 20 74 61 6b 65 20 74 77 6f 20 6c 69 6e 65 73 22 00 
InTrace:     "fieldC": "A pretty long string that should take two lines"
    02 00 00 00 05 00 00 00 7d 00
InTrace: }
>>  04 00 00 00 3e 00 00 00 02 00 00 00 
InIsolate 2
    0a 00 00 00 2a 00 00 00 01 00 00 00 00 00 00 00 00 00
InSwfInfo
>>  04 00 00 00 3e 00 00 00 01 00 00 00 
InIsolate 1
    00 00 00 00 19 00 00 00
InProcessTags
<<  04 00 00 00 36 00 00 00 01 00 00 00 
OutSetActiveIsolate 1
    00 00 00 00 17 00 00 00
OutProcessedTags
>>  04 00 00 00 3e 00 00 00 02 00 00 00 
InIsolate 2
    04 00 00 00 3b 00 00 00 02 00 00 00
InIsolateExit 2
<<  0e 00 00 00 1b 00 00 00 63 61 6e 5f 74 65 72 6d 69 6e 61 74 65 00
OutGetOption: can_terminate
>>  04 00 00 00 3e 00 00 00 01 00 00 00 
InIsolate 1
    05 00 00 00 3d 00 00 00 01 00 00 00 01
InSetActiveIsolate 1 (changed)
>>  13 00 00 00 20 00 00 00 63 61 6e 5f 74 65 72 6d 69 6e 61 74 65 00 74 72 75 65 00
InOption: can_terminate: true
<<  01 00 00 00 0d 00 00 00 01
OutExit (terminate)
>>  00 00 00 00 19 00 00 00
InProcessTags
Socket has been closed

So this seems to be going okay: the "in" messages from the player give the final bits of trace from the main process (1), the debugger confirms that it had switched to the main process to display the traces, then the player says that isolate 2 (the worker) has quit... at which point the debugger is asking if we can terminate, the player sets the active isolate to 1 as that's the only one left now, and confirms that we can terminate (this is just an option i.e. capability) - and then the debugger sends an "exit" request, with that final 01 parameter which means the player should terminate rather than just disconnecting the debugger.

@joshtynjala I'm wondering whether you have a custom implementation of the debugger going on within the VSCode extension, are you able to check the behaviour here? Looking at the flex-sdk FDB code, this to me looks like something has called PlayerSession.terminate(), so it does the "can you terminate" query and if that returns a true, sends the "exit" request. In that app, it seems this would only be called from the DebugCLI.java file i.e. either something throws an exception during the execute() function, or this would have to be requested by the user.

Alternatively @Adolio if you ever see this when using FDB, please let me know..

Plus, please let me know if you need more details!

joshtynjala commented 4 days ago

@joshtynjala I'm wondering whether you have a custom implementation of the debugger going on within the VSCode extension, are you able to check the behaviour here? Looking at the flex-sdk FDB code, this to me looks like something has called PlayerSession.terminate(), so it does the "can you terminate" query and if that returns a true, sends the "exit" request. In that app, it seems this would only be called from the DebugCLI.java file i.e. either something throws an exception during the execute() function, or this would have to be requested by the user.

The vscode-swf-debug extension uses the SWF debugger from Apache Royale. I'll try to take a look at what it's doing when I have some time.