NativeScript / android

NativeScript for Android using v8
https://docs.nativescript.org/guide/android-marshalling
Apache License 2.0
524 stars 135 forks source link

Android worker crash since tns-android 6.3.0 #1602

Closed felixkrautschuk closed 2 years ago

felixkrautschuk commented 4 years ago

Environment Provide version numbers for the following components (information can be retrieved by running tns info in your project folder or by inspecting the package.json of the project):

Describe the bug When starting a worker script for the second time on an Androd 9 / 10 physical device, the worker thread throws an exception and most times, the app crashes as well.

Please note

What we are doing in the demo app provided below On button tap we call this:

const WorkerScript = require("nativescript-worker-loader!./myworker");
const worker = new WorkerScript();

worker.onmessage = function(msg) {
        if(msg.data.isWorkerLoaded) {
            console.log("#### worker notify state 'loaded' -> start action...");

            worker.postMessage({
                action: "test"
            });
        } else if(msg.data.hasOwnProperty('success')) {
            console.log("#### worker success");

            worker.terminate();

            alert("Worker success");
        }
};

worker.onerror = function(e) {
        console.log("#### worker onerror: ", e);
        worker.terminate();

        alert("Worker onerror");
};

And that's the worker script:

//myworker.js

require("@nativescript/core/globals");

//notify main-thread about worker being loaded
global.postMessage({isWorkerLoaded: true});

global.onmessage = function(msg) {
    if(msg.data.action === "test") {
        setTimeout(function() {
            global.postMessage({ success: true });
        }, 3000);
    }
};

To Reproduce 1.) Start the demo app provided below (on Android 9 or 10 physical device) 2.) Tap the button, wait some short time and you see the alert "Worker success" 3.) Tapping the button for a second time should also alert "Worker success", but instead it crashes the app

4.)

tns platform remove android
tns platform add android@6.4.0
tns run android

5.) repeat steps 1-3 and you see that the app won't crash anymore.

Expected behavior You should be able to start and terminate this worker script as often as you want, like it was working until tns-android 6.3.1

Sample project ns-worker-issue-android10.zip

felixkrautschuk commented 4 years ago

In our real app, we do some file-system stuff in the worker script and instead of just silently crashing, the app also logs something like this in the console:

worker onerror: { JS: "message": "Uncaught TypeError: com.tns.NativeScriptApplication.getInstance is not a function", JS: "stackTrace": "TypeError: com.tns.NativeScriptApplication.getInstance is not a function\n at Object.getNativeApplication (file:///data/data/org.nativescript.nsworkerissueandroid10/files/app/c9e203512527157969ea.worker.js:502:57)\n at getApplicationContext (file:///data/data/org.nativescript.nsworkerissueandroid10/files/app/c9e203512527157969ea.worker.js:4832:44)\n at FileSystemAccess.../node_modules/@nativescript/core/file-system/file-system-access.js.FileSystemAccess.getDocumentsFolderPath (file:///data/data/org.nativescript.nsworkerissueandroid10/files/app/c9e203512527157969ea.worker.js:4991:19)\n at Object.documents (file:///data/data/org.nativescript.nsworkerissueandroid10/files/app/c9e203512527157969ea.worker.js:5778:42)\n at ../node_modules/nativescript-dev-webpack/hmr/hot-loader.js!./myworker.js.global.onmessage (file:///data/data/org.nativescript.nsworkerissueandro...

I updated the demo app to do some totally simplified file system stuff:

//myworker.js

global.onmessage = function(msg) {
    if(msg.data.action === "test") {
        const documents = fileSystemModule.knownFolders.documents();
        const folderCustom = documents.getFolder("custom");
        var pathToFile = fileSystemModule.path.join(folderCustom.path, "test" + ".json");

        const json = {
            "id": 1,
            "foo": "bar"
        };

        fileSystemModule.File.fromPath(pathToFile).writeTextSync(JSON.stringify(json));

        global.postMessage({ success: true });
    }
};

global.onerror = function(err) {
    console.log("global.onerror:");
    console.log(err);
};

ns-worker-issue-android10.zip

NathanaelA commented 4 years ago

@felixkrautschuk - Does your app use Flavors in your gradle configurations by any chance? I worked with #1597 and we spent a lot of time tracking it down to flavors messing something up, but your error I believe is identical, and it was with workers...

felixkrautschuk commented 4 years ago

@NathanaelA no it does not use flavors, neither the demo app from here nor our real app.

NathanaelA commented 4 years ago

@felixkrautschuk - Thanks for confirming that -- I was really hoping you would say yes, but guessed that it would be no since you said you could duplicate it with your demo. :-)

felixkrautschuk commented 4 years ago

Sorry for that :D

When I run into a new issue, it is always the first step for me to create a new app project and check if it also occurs there and if not, I know that is must be a mistake from myself in my own project. But that's not the case here ;-)

felixkrautschuk commented 4 years ago

Just few more information on this:

The issue is not related on Android 10 only and also occurs on Android 9 physical devices. On devices with older Android versions, we cannot reproduce this (I already updated the title and description of this thread).

If the worker only sends a message back to the main thread (or may be does a console log before that) and then terminates, everything works as expected on ALL devices.

But as soon as you but that into a setTimeout-call, it crashes with

JS: worker onerror: { JS: "message": "Uncaught TypeError: android.os.Looper.myLooper is not a function", JS: "stackTrace": "TypeError: android.os.Looper.myLooper is not a function\n at createHandlerAndGetId (file:///data/data/org.nativescript.nsworkerissueandroid10/files/app/0f9e8cc21cc309f62b64.worker.js:10892:67)\n at setTimeout (file:///data/data/org.nativescript.nsworkerissueandroid10/files/app/0f9e8cc21cc309f62b64.worker.js:10904:14)\n at ../node_modules/nativescript-dev-webpack/hmr/hot-loader.js!./myworker.js.global.onmessage (file: app/myworker.js:36:0)", JS: "filename": "file:///data/data/org.nativescript.nsworkerissueandroid10/files/app/0f9e8cc21cc309f62b64.worker.js", JS: "lineno": 10892 JS: }

NathanaelA commented 4 years ago

@felixkrautschuk - One minor question; you say it fails on 6.4.1 and 6.5.0 does this work fine on 6.4.0?

felixkrautschuk commented 4 years ago

@NathanaelA Indeed, 6.4.0 is the latest working android runtime, not 6.3.1.

In the release notes, they only mentioned 6.4.1, that's why I did not explicitly test 6.4.0.

saschaarthur commented 4 years ago

any idears/updates to this?

alexander-mai commented 4 years ago

I have the same problem in my app. If I don't terminate the Worker process, the problem is gone. Is it possible, that the application is trying to reuse the terminated worker instance?

felixkrautschuk commented 4 years ago

I just found out that it is not related only on tns-android 6.4.1+, it also happens on tns-android 6.4.0, but not that often like on 6.5.0 or 6.5.1.

It also happens on Emulators by the way... not sure why it was working on emulators for me when I created this issue.

lochstar commented 4 years ago

I'm having the same problem. Currently on tns-android 6.5.0

phdd commented 4 years ago

Confirmed on tns-android 6.5.3

Pkurto commented 4 years ago

Same problem in my app: tns android 6.5.3 and physical devices android 10

NathanaelA commented 4 years ago

Just a status update, I finally had a chance with the above demos I am able to cause a random crash on both the android.os.looper and it totally segfaulted on an emulator a couple of times. So I can duplicate this on Android 10 emulators with both samples.. THANK YOU @felixkrautschuk for the samples!


It does look like something in the worker is getting corrupted; not sure what yet.


After some more discussions with my partners; we have agreed that this type of bug should be fixed on a higher priority basis, as crashing is a bad thing. So I'll be looking into it shortly. Hopefully have some good news soon.

senner007 commented 4 years ago

I have the same issue. The app crashes if I call .terminate() or close()

DeclanKillilea commented 4 years ago

Hi - Is there a release version date / version chosen for this fix as yet? Thanks

NathanaelA commented 4 years ago

Hi, this is actually one of my higher priority bugs I'm looking into -- now that we finally pushed out NS7 yesterday; the availability to track down some of these bugs should be a lot more available. :) I will be converting the two sample app to NS7 today and verifying that the issue wasn't fixed by some of the other issues we fixed in NS7 and/or by the engine upgrade that also happened.

DeclanKillilea commented 4 years ago

Hi, this is actually one of my higher priority bugs I'm looking into -- now that we finally pushed out NS7 yesterday; the availability to track down some of these bugs should be a lot more available. :) I will be converting the two sample app to NS7 today and verifying that the issue wasn't fixed by some of the other issues we fixed in NS7 and/or by the engine upgrade that also happened.

Great thanks, I tried running npm view tns-android versions today but its not picked up 7.0.0 as I was hoping to test similar to see if that upgrade has fixed it

NathanaelA commented 4 years ago

It is no longer called tns-android; everything is now in the @nativescript namespace. So @nativescript/android is the runtime...

If you want to manually try this engine with a older CLI; you can use npm info to get the download link (or download it from github releases) of the latest engine, then use tns platform add android --frameworkPath <tgz_file_downloaded>

lostation commented 4 years ago

Exact same behavior here.... with any call to terminate inside or outside of worker -> crash... without any call to terminate()...its works... @NathanaelA Please if you fin a solution ( soon I hope ;-) )...Can you also make it available for NS 6.8.0... I'm reaaaally not ready to upgrade to NS 7 right now... -> too big angular 8 shared code project...will make it slow to angular 10..

lostation commented 4 years ago

Is it a so really bad practice to not call terminate() on worker ref ? Instead I'm currently set the worker ref to null... and recreate a new worker each time I need it... It doesn't seems to get any issue until there... Should be garbage collected at time... Am I right ?

NathanaelA commented 4 years ago

@lostation - We would make the patch for NS7 Android runtime, since that is what we are maintaining... However, before that statement worries you; the good news is you can easily use the NS7 runtimes with NS6. Their are no breaking changes and they should be 100% compatible. :-)

Now as to how you are causing a crash, is all you are doing is calling terminate() inside the worker? Does it crash every time?

lostation commented 4 years ago

You mean I can upgrade to NS7 while keeping typescript 3.5.3 and angular 8.21 ?

If yes, can you tell me how, I'm not yet an expert... ? I'm using big angular shared project...(Alex Ziskind has made an audit for this project some months ago, he can tell you about it).  Sometimes it's quite hard to manage shared Web and NS project.

About worker crashes, I've tried both calling context.terminate() from inside worker, only when error happened inside it, and also when workers job is done from an outside service, using the worker ref.

In fact, I have 2 workers instanciated but not at same time and I reuse the same variable while instanciating new Worker(). The first one is used to read files from filesystem and second one to write files back to filesystem... And as you mentioned, the first read worker works just fine even using terminate from outside service, but then after some time I create the second worker for writing cache stuffs and this is when it crashes at the time of calling terminate inside or outside this second worker, also if error inside or not.

Right now, I comment all calls to terminate and it seems OK.

Le 11 sept. 2020 à 20:43, à 20:43, Nathanael Anderson notifications@github.com a écrit:

@lostation - We would make the patch for NS7 Android runtime, since that is what we are maintaining... However, before that statement worries you; the good news is you can easily use the NS7 runtimes with NS6. Their are no breaking changes and they should be 100% compatible. :-)

Now as to how you are causing a crash, is all you are doing is calling terminate() inside the worker? Does it crash every time?

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/NativeScript/android-runtime/issues/1602#issuecomment-691254228

NathanaelA commented 4 years ago

NS has several pieces;

The android runtime is where this issue would be occurring, if you are calling terminate and it is crashing.

The NS 7 android runtime should currently be backwards compatible with NS6, however to install it you have manually download it and use the --frameworkPath variable that I described a couple posts up in this issue. because the NS6 CLI tooling is unaware of the @nativescript/android name space.

benediktveith commented 3 years ago

A fix for this bug would be highly appreciated :)

con-cis commented 3 years ago

I think I've found the last working version. In my case it is:

tns platform add android@6.4.0-2019-12-06-124802-03

the sporadically error occurs since version:

tns platform add android@6.3.0-rc-2019-12-06-153530-02

Somehow the fix for issue #1550 is causing the trouble with Android 11 & 10 devices I guess..

it should be caused by the changes on commit

https://github.com/NativeScript/android-runtime/commit/be4be1fd8422cea87df9053af8ba873b268fd653

can anyone confirm?

alexander-mai commented 3 years ago

Is there any fix now? The problem still exists on Nativescript Core 6.5.24 with NativeScript Cli 7 and Android runtine 7.0.1.

I'm using the workers to create PDF files. After some debugging and testing, it seems like any standard stuff does not exists. When I create an instance of my own classes and call methods of them, there is no problem. When i simply call console.exception() i get the error TypeError: console.exception is not a function

Not to call worker.terminate is also not an option, because the application crashes on iOS after creating the 10th or 11th worker instace. Before the app is closing, I can can see a memory presure warning. On Android,, the App is getting slower and slower with every new worker instance. So it looks also like a memory problem.

felixkrautschuk commented 3 years ago

@NathanaelA I hope this issue this issue can get a bit more attention, actually it is the only reason for us not being able to upgrade to NS 7, because it is still occurring with the latest android runtimes @nativescript/android 7.0.1.

I was wondering why some users of our real app report crashes with the worker, because we are using tns-android 6.4.0 there as I was sure that this would be the latest stable version concerning workers. BUT it turned out, that it is not, because I noticed the same problem with the debug build of our app with some devices. I am not sure why I wa snot able to reproduce this issue using tns-android 6.4.0 some months ago...

That's why I tested my demo app (see first message here in this thread) again on 2 emulators and 3 real devices using every version of tns-android from 6.2.0 to 7.0.1.

The result: tns-android 6.2.0 is the LATEST STABLE VERSION concerning workers.

It was the only version that was working without a crash on ALL devices and emulators. I opened and terminated the worker 50 times on each device and each emulator and I got no crash and the app always went to the success callback.

Devices/Emulators:

My tests:

For each tns-android version, I did the test multiple times on each device/emulator so the number of attempts until the crash occurs varies a little bit but most of the time, it was nearly the same on each test. I am not sure why the test succeeded so many times on the Android 10 emulator on all tns-android versions (even the latest 7.0.1) and the Nokia 5.1 (device 2) was also working for more than 20 times even on tns-android 6.4.0... Maybe that's the reason why I originally said 6.4.0 would be the latest working version as I am quite sure I was using the Nokia 5.1 Android 9 device and the Android 10 Emulator for testing some months ago, when I was creating this issue here on Github.

But on the other 2 devices and the Android 11 emulator, the app was crashing very early. When I say "crashing", it means: sometimes the app was just crashing (closing) without any error message, and sometimes the app was still alive, but the onError-callback of the worker was called with a message like this:

JS: #### worker notify state 'loaded' -> start action... JS: #### worker success JS: #### worker notify state 'loaded' -> start action... JS: #### worker onerror: { JS: "message": "Uncaught TypeError: android.os.Looper.myLooper is not a function", JS: "stackTrace": "TypeError: android.os.Looper.myLooper is not a function\n at createHandlerAndGetId (file:///data/data/org.nativescript.nsworkerissueandroidns7/files/app/5d4685f1e68cfd1e47fe.worker.js:11229:67)\n at setTimeout (file:///data/data/org.nativescript.nsworkerissueandroidns7/files/app/5d4685f1e68cfd1e47fe.worker.js:11237:16)\n at ../node_modules/@nativescript/webpack/hmr/hot-loader.js!./myworker.js.global.onmessage (file: app/myworker.js:9:0)", JS: "filename": "file:///data/data/org.nativescript.nsworkerissueandroidns7/files/app/5d4685f1e68cfd1e47fe.worker.js", JS: "lineno": 11229 JS: }

(the error message was not always exactly the same as this one, but very similar)

6.2.0 was the ONLY one, where I had no single chance to crash the app. I tried the tns-android 6.2.0 on our real app again on all emulators and devices and it was the same result on multiple test runs: NO SINGLE CRASH until opening and terminating 50 workers.

Here is the demo app, using the latest stable 6.2.0 runtime: ns-worker-issue-ns6-2.zip

Here is the demo app migrated to NS 7 using the latest dependencies (@nativescript/core 7.1.3 and @nativescript/android 7.0.1): ns-worker-issue-ns7.zip

Maybe someone can take those two demos and confirm that?

con-cis commented 3 years ago

give it a try with

tns platform add android@6.4.0-2019-12-06-124802-03

see my comment: https://github.com/NativeScript/android-runtime/issues/1602#issuecomment-725535838

this is the version history from npm: https://www.npmjs.com/package/tns-android?activeTab=versions shows that the version numeration is not continuous..

tns-android Versions

...with knowing the exact commit, maybe someone could fix this error

felixkrautschuk commented 3 years ago

@con-cis I saw your comment and it also made me believe that you could be right, but as you can see in my test-list, the app is still crashing very often, even with 6.3.1 (and also 6.3.0)

con-cis commented 3 years ago

6.3.1 (and also 6.3.0) are later released than 6.4.0-2019-12-06-124802-03

felixkrautschuk commented 3 years ago

@NathanaelA are there any news on this?

I tried not calling worker.terminate() and instead setting the worker ref to null and indeed, it seems to work (the app does not crash). But when looking into the memory profiler of Android Studio, the app is just allocating more and more memory when creating new workers, even when running utilsModule.GC().

Please tell me if you need more input.

felixkrautschuk commented 3 years ago

Unfortunately it is still an issue with NS 8.

Updated sample: ns-worker-8.zip

alexander-mai commented 3 years ago

I am switched to having only 1 worker and different execution classes. With a type parameter I tell the worker which class to execute. Each request is packed into an Observable. An additional parameter is a unique ID, which is used to assign the worker’s response to the correct observable. This worker is never terminated and never recreated. With this approach, I have no problems with parallelism or memory consumption.

I hope this helps someone.

farfromrefug commented 3 years ago

@alexander-mai i use the same technique and never had a crash either. However that being said i think we should get this fixed.

felixkrautschuk commented 3 years ago

@farfromrefug @alexander-mai do you see any issues with battery usage of your app using that approach of running a worker thead all the time without temrinating it?

alexander-mai commented 3 years ago

Hello @felixkrautschuk , I can't see any battery issue. The only difference is the higher complexity of the worker to differentiate between the tasks and the assignment of the responses to the individual callers.

a92c commented 2 years ago

news? or as always everyone in silence?

triniwiz commented 2 years ago

We are currently doing work on the runtime, I’ll try my best to get to this :) 

Sent from Yahoo Mail for iPhone

On Thursday, December 16, 2021, 12:32 PM, A92C @.***> wrote:

news? or as always everyone in silence?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are on a team that was mentioned.Message ID: @.***>

triniwiz commented 2 years ago

A fix was implemented for this in the 8.2.0-alpha.5 so for those who want to give it a try please install the runtime using @alpha to test , keep in mind the 8.2 alpha uses gradle 7 which requires Java 11 and also removed compile dependency configuration (which was deprecated for a while now) in favour of implementation

triniwiz commented 2 years ago

fixed in 8.2