airsdk / Adobe-Runtime-Support

Report, track and discuss issues in Adobe AIR. Monitored by Adobe - and HARMAN - and maintained by the AIR community.
200 stars 11 forks source link

Adobe Scout Error with 33.1.1.929 #2139

Open 2jfw opened 2 years ago

2jfw commented 2 years ago

Adobe Scout is reporting an error when using AIR SDK 33.1.1.929.

Error states:

We encountered and error while processing this session:
Some of the data we present my not be correct.

Same build with 33.1.1.795 does not have this issue. Attached two log files from Scout, one from 33.1.1.929 and one from 33.1.1.795.

Memory usage reporting seems to be broken in a way.

grafik

33.1.1.929: grafik

33.1.1.795: grafik

Scout Error.zip

2jfw commented 1 year ago

I think that this issue was fixed a while ago but is back in 50.2.1 in the debug version

ajwfrost commented 1 year ago

@2jfw do you have any content that will trigger this message? It's all working fine for the machines we have here... Details of the Scout installation/version/OS would probably be useful just to check that we're looking at the same thing!

thanks

2jfw commented 1 year ago

The following setup is used: Windows 10: 22H2 19045.2604 Scout: 1.1.3.354121

@50.2.1: Running Scout again, it actually behaves correct for the first ~30 seconds, then some "heavy" processing in the app occurs and after that, Scout shows the Error "We have encountered an error while processing this session: Some of the data we present may not be correct.". This would explain why Scout shows like ~4.2GB of RAM usage for "ByteArrays".

@33.1.1.795: No issues found; after the heavy processing @~30s Scout still logs correct data

Attached a ZIP with the two saved Scout sessions for reference. Scout Session AIR 33 vs 50.zip

ajwfrost commented 1 year ago

Hi

There's a lot more going on in that v50 log .. if you look at the trace output in frame 1145, there's loads more output than you get in the v33 log file where it also calls the RestRequest_-_getServiceDiscovery stuff...

Comparing the logs, there's lots more on this one frame, starting with

10:01:42.723 [DEBUG] com.dallmeier.semsyNext.mqtt.SemsyCoreMQTT onDebugMQTT : initMqttSocket : useSSL=false

So I'm wondering whether Scout can only cope with a certain amount of data in one frame perhaps.

Is this something we could try to reproduce here?

I'd also be interested to see if it makes a difference if you turn off the Worker traces? (Edit -> Preferences) .. these aren't worker FLM files but there are lines like WorkerThread : callWorkerFunction : subscribe,[object SubscriptionVO] which may indicate something happening there (and this is (a) experimental in Scout, and (b) an area that we have changed...)

thanks

2jfw commented 1 year ago

The logs are generated from two debug builds (running in ADL) with the very same source code but only a different SDK; so if there is more going on in that frame, it must be related to the SDK I think. However, the release builds are running properly and Scout does not have any issue. Not sure what you exactly mean with "Edit -> Preferences" on "turn off the Worker traces?" - I can remove the traces manually or use compiler arg "omit-trace-statements=true" on the Worker (?) - let me try that ...

Update: Scout Session 50.2.1 (No Worker traces).flm.zip

Removing the worker traces does not make a difference (attached a log with the Scout session in the case you want to have a look)

Update 2: I did remove more traces ("FlexVersion.compatibilityVersion") and all in "SemsyCoreMQTT" class and "StompService" class.

[Sidenote: What makes me wonder is that I do see dozens of `Reference Errors" in the Scout Log - I do not see them anywhere in debug popping up so maybe a try-catch is preventing that. Need to check on this, too].

I will now check, if the SocketMonitor is involved as I can see that this is getting started at the point where the logs becomes "corrupted".

Update 3: Does not look like the SocketMonitor is involded (attached a new log with recuded output). Scout Session 50.2.1 (No Worker traces, reduced traces, no SocketMonitor).flm.zip

We will try to find the reason for the "Reference Errors" and see if we can prevent them

ajwfrost commented 1 year ago

That preferences thing, I meant in Scout .. presumably your app uses Workers, in Scout the information from these can also be captured but there were some known issues with it (or at least, it's advertised only under "beta features"). If you only see one Scout session for your app, then probably this isn't turned on anyway...

The differences between what happens on different frames is likely to come from Flex, there's a lot of asynchronicity added which means sometimes I guess a lot of things all happen on the same frame. But that wouldn't explain why it happens with 50.x and not with 33.x ... is it 100% reproducible with 50.x and 0% on 33.x, or does it sometimes work with 50.x and fail with 33.x?

I suspect the issue is in the data that Scout is receiving, so having the Scout log may not be enough to capture this, but let me see if there's a way we can get in the middle....

thanks

2jfw commented 1 year ago

That preferences thing, I meant in Scout .. presumably your app uses Workers, in Scout the information from these can also be captured but there were some known issues with it (or at least, it's advertised only under "beta features"). If you only see one Scout session for your app, then probably this isn't turned on anyway...

Okay, got you! I can confirm that the "Start sessions for ActionScript Workers" is disabled in Scout.

I think that embedded Workers do not occur as second Scout session anyways (Referring to https://stackoverflow.com/questions/21827477/adobe-scout-is-not-starting-a-new-session-for-loaded-swfs)

So separate scout sessions are started for Worker SWFs (separate runtime instance). If you simply use a Loader, those telemetry data are included in your main session.

2jfw commented 1 year ago

Scout Logs with separate Worker session.zip

So I enabled the "Worker session" in Scout and can see that a new session for the Worker is actually occurring in Scout. I attached a log for this scenario, too, just in case..

2jfw commented 1 year ago

is it 100% reproducible with 50.x and 0% on 33.x, or does it sometimes work with 50.x and fail with 33.x?

Yes, I can confirm that it is 100% reproducible with 50.x and 0% on 33.x. At the point where the logs starts to get "corrupted" in 50.x I can see the dozens of "Reference Errors" - none of these occur in 33.x so I would think that this may be the reason - However, I cannot see why 50.x would cause them as they are only visible in Scout

ajwfrost commented 1 year ago

So perhaps there's a corruption such that Scout thinks that there are reference errors, but there aren't? Or, something is happening in 50.x which leads to reference errors but in such a way that they're never thrown to the debugger or outside handler...

Just putting together a quick mechanism to capture/replay the Scout data directly; are you able to run the attached application (close down Scout first) -> open this up via ADL or similar, and click the "Listen" button which should then disappear. Then, run your 50.x based application for a bit, and then close it.

You should then have a file under "%AppData%\ScoutRecorder\Local Store" that contains the Scout socket data, can you please send that to us?

thanks ScoutRecorder.zip

2jfw commented 1 year ago

Okay, done - I'm attaching the data here Scout13008.zip

ajwfrost commented 1 year ago

Thanks .... that "we encountered an error" is shown very early on! Will have to do a bit more digging into this one I'm afraid... we'll re-review the changes that went into 50.x though, in case there's something there where we're doing something wrong with the data that then goes to Scout..

thanks

2jfw commented 1 year ago

Thanks Andrew! Quick reminder, that the release version does not have the issue

ajwfrost commented 1 year ago

Ah .. so I've just spotted that: "but is back in 50.2.1 in the debug version" -> do you mean, this issue happens when you're running a debug version of the app (i.e. via ADL, or with a launcher but with the debug capabilities in the SWF) but it's fine when you're running a release SWF (even when using Scout still)?

thanks

2jfw commented 1 year ago

Yes, exactly - the issue only happens when running with ADL

ajwfrost commented 1 year ago

Okay thanks. So it seems like the reason for the message appearing is because there's a check on the "total memory" figure, as reported by the runtime, against the child elements that make up the memory usage. And in this case, we've got total memory of 345kb but that includes 2x 4GB blocks for byte arrays and other players (probably, that's the same thing, and probably there's a signed/unsigned issue or similar that's going on...)

image

Will see if we can reproduce this with something similar to see what the runtime is actually sending over...

2jfw commented 1 year ago

345kb

[I think its actually 345MB 😅]

I swapped the adl.exes (renamed adl64.exe to adl.exe) and Scout is no longer showing the 2x 4GB blocks. This would explain why the release build, which is 64Bit, does not show the issue, so it is related to 32Bit

EDIT: I did check the old SDK 33.1.1.795 on my disk to make sure I have not swapped the adl.exe somewhen back then and forgot about that and it looks like I did not - the adl.exe from the old SDK is 32Bit

ajwfrost commented 1 year ago

I think its actually 345MB

Oh yes!

it is related to 32Bit

Good point... the 4GB figure could come from something that's meant to be a "-1" value. Really not sure why there's a difference between the 795 build and this one though. 795 to 929 ... we can check the changes again as well to see if that brings up an issue...

2jfw commented 1 year ago

Somehow I thought I did enable Advanced Telemetry in all these logs but I just found out I did pass the parameter as program/launch parameter rather than as a compiler arg so for the sake of completeness I'm attaching a Scout log from ADL 32Bit with "-advanced-telemetry=true". Like you already noted, at the point where the issue occurs, the application is doing some heavy processing

Link: https://cloud.dallmeier.com/s/bNiEfsE3wnYQfAW

Password: Tagw3fp4

ajwfrost commented 1 year ago

Thanks -> so that seems to indicate it's happening within the MessageChannel communication, there's a receive() happening there and 205KB of assignment of a ByteArray, which the memory tab seems to think equates to 4GB or so...

ajwfrost commented 1 year ago

Okay so I think we've found the culprit here:

[github-402] Sending ByteArray from Worker cause Adobe Scout show incorrect memory consumption

But it would be interesting to know what actually causes this.. looking at the code, I'm wondering if there's a change in size of the byte array within a message channel that we're missing, which would mean e.g. we signal the allocation of zero bytes and then the deallocation of e.g. 20-odd bytes, resulting in a negative number cast to an unsigned 32-bit value...

thanks

2jfw commented 1 year ago

But it would be interesting to know what actually causes this.. looking at the code, I'm wondering if there's a change in size of the byte array within a message channel that we're missing, which would mean e.g. we signal the allocation of zero bytes and then the deallocation of e.g. 20-odd bytes, resulting in a negative number cast to an unsigned 32-bit value...

Sounds good! Let me know, if I can provide you any help

ajwfrost commented 1 year ago

Let me know, if I can provide you any help

Actually yes .. we've been struggling to get a reproduction here. I believe the problem is caused by how the shared byte arrays are used, but trying different things (allocating in one worker, clearing in another, etc) .. the closest we've got is for the worker thread to tell us about the invalid data, so we'll see if that gives any hints as to the root cause, but it would be nice to be able to reproduce exactly what you see!

Do you know how you set up the worker message channels, where are you creating the byte arrays and then where are you changing their lengths?

thanks

ajwfrost commented 1 year ago

Ah .. actually then I've just gone back and seen a message that we have reproduced it, simply adding the "shareable" flag before populating the data in the Worker. So -> we should be able to try to fix this now, definitely related to the changes we'd made previously :-(

2jfw commented 1 year ago

Ah .. actually then I've just gone back and seen a message that we have reproduced it, simply adding the "shareable" flag before populating the data in the Worker. So -> we should be able to try to fix this now, definitely related to the changes we'd made previously :-(

This is great 👍 Thanks a lot!