Swrve / swrve-android-sdk

Swrve Android SDK
http://www.swrve.com
Other
10 stars 18 forks source link

JobIntentService crashing when performing push notification #282

Closed HGyllensvard closed 5 years ago

HGyllensvard commented 5 years ago

Dear Swrve.

We are using 6.0.1 of your SDK and whenever we send out a push notification we get a small spike in crahes in the application which we are seeing through Fabric, we are up in the thousands of crashes now, and each push notification we perform adds a few hundred extra.

The stacktrace of the crash for us is (Only occurs for Android Oreo and up ):

  | android.os.AsyncTask$3.done (AsyncTask.java:353)
  | java.util.concurrent.FutureTask.finishCompletion (FutureTask.java:383)
  | java.util.concurrent.FutureTask.setException (FutureTask.java:252)
  | java.util.concurrent.FutureTask.run (FutureTask.java:271)
  | java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1162)
  | java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:636)
  | java.lang.Thread.run (Thread.java:764)

And

  | android.os.Parcel.readException (Parcel.java:2029)
  | android.os.Parcel.readException (Parcel.java:1975)
  | android.app.job.IJobCallback$Stub$Proxy.dequeueWork (IJobCallback.java:191)
  | android.app.job.JobParameters.dequeueWork (JobParameters.java:208)
  | android.support.v4.app.JobIntentService$JobServiceEngineImpl.dequeueWork (JobIntentService.java:315)
  | android.support.v4.app.JobIntentService.dequeueWork (JobIntentService.java:640)
  | android.support.v4.app.JobIntentService$CommandProcessor.doInBackground$10299ca (JobIntentService.java:2390)
  | android.os.AsyncTask$2.call (AsyncTask.java:333)
  | java.util.concurrent.FutureTask.run (FutureTask.java:266)
  | java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1162)
  | java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:636)
  | java.lang.Thread.run (Thread.java:764)

Based upon the strack trace we searched our project for usage of JobIntentService. Where for us, Swrve is the only library using it (at least that we can see), but supporting this is that we can clearly see spikes in crashes come the same minute that a push notification is sent out.

Looking around online for the error, the issue might not be in the SDK itself. The reports of this bug is very similar: https://issuetracker.google.com/issues/63622293

There is a suggested short-term solution suggested in there to catch this error. But as the usage is in the Swrve SDK, we cannot try this out for ourselves. And preferably Google would fix the issue first.. but the ticket has been open quite a while.

While: https://github.com/Swrve/swrve-android-sdk/issues/278 seems related, we never customise any of the Swrve behaviour, so I thought best keep it as a separate issue.

About our project and its setup:

We are on the firebase version (6.0.1) and no Amazon integration. For our setup of Swrve, if you read: https://docs.swrve.com/developer-documentation/integration/android/ -> Using only Swrve push notifications

Our code is basically a copy of your instructions there, but I notice one difference in the manifest:

<service android:name="com.swrve.sdk.SwrveFirebaseMessagingService" android:exported="false">
            <intent-filter>
                <action android:name="com.google.firebase.MESSAGING_EVENT" />
            </intent-filter>
        </service>

Where we have "exported=false", which I don't know if that would be an issue, but if it was I would have expected the push notification functionality to be broken completely.

We have min SDK 21 Target 28 Tools 28.0.3

Do you think it could be related to the bug ticket I linked? Another issue in the SDK or possibly in how we are using Swrve in the app.

Thank you!

dominicmarmionswrve commented 5 years ago

Hi @HGyllensvard, thanks for the excellent description.

You can ignore the "exported=false" investigation as the default is false.

Was there any more info in the Fabric stacktrace? In particular I'm looking for a mention of Swrve in the stacktrace. I'm not sure where we'd even try to catch the error if it is indeed related to the SwrveSDK. The fact that the time of the crash correlates to the time of a push doesn't mean its caused by the SwrveSDK. If the app is not in the background then the app could be started again by receiving the push which means other libraries and code may have executed.

We can take a look, but its going to be very difficult if we can't reproduce. Was there any particular device or version of android we can narrow the investigation down to?

HGyllensvard commented 5 years ago

Hi @dominicmarmionswrve

Thank you for your reply.

In the stack trace for the crashing code itself, sadly there is no more information in the stack trace itself.

We can see one other thread from Swrve running:


android.database.sqlite.SQLiteConnection.nativeExecute (SQLiteConnection.java)

android.database.sqlite.SQLiteConnection.execute (SQLiteConnection.java:751)

android.database.sqlite.SQLiteSession.endTransactionUnchecked (SQLiteSession.java:437)

android.database.sqlite.SQLiteSession.endTransaction (SQLiteSession.java:401)

android.database.sqlite.SQLiteDatabase.endTransaction (SQLiteDatabase.java:710)

com.swrve.sdk.localstorage.SQLiteLocalStorage.saveMultipleEventItems (SQLiteLocalStorage.java:218)

com.swrve.sdk.localstorage.SwrveMultiLayerLocalStorage.flushEvents (SwrveMultiLayerLocalStorage.java:183)

com.swrve.sdk.localstorage.SwrveMultiLayerLocalStorage.flush (SwrveMultiLayerLocalStorage.java:172)

com.swrve.sdk.SwrveBase$7.run (SwrveBase.java:513)

com.swrve.sdk.SwrveRunnables$1.run (SwrveRunnables.java:13)

java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1162)

java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:636)

java.lang.Thread.run (Thread.java:764)

The other threads which are not idling are Fabric / Crashlytics related.

Regards to your question: It happens in a wide variety of devices with Android 8 and up.

We did no see any of these issues before we added Swrve. (there were some other changes in the release containing Swrve, but nothing that is asynchronous using JobIntentService).

With that said I absolutely agree with you that it could be a cause from code elsewhere, which just happens to be executed when the push notification comes in and ends up in the wrong state. That it's in Swrve is just a suspicion on our side, we are definitely not sure.

Overall there is a low amount of users affected (but our most common crash), and sadly we have no steps of reproducing this, so I understand it's not easy to move forward on. Then it also feels a bit odd if this would be the first report of this to you as you would think it would affect all your users.

Regards to the bug ticket at Google's forum, one of the project's solution to the problem was this: https://github.com/optimizely/android-sdk/pull/193 Does your usage of JobIntentService look vaguely similar?

Any other ideas you could think of worth trying for us?

dominicmarmionswrve commented 5 years ago

Hi @HGyllensvard, is the above still an issue for you?

dominicmarmionswrve commented 5 years ago

Closing this as unable to reproduce and has been inactive for a long time.

GrinGraz commented 5 years ago

Hi there @dominicmarmionswrve , as @HGyllensvard mention this is a well known issue and we had faced exactly the same. In our project Swrve SDK is the only lib using JobIntentService that is the class that launch the exception.

Fatal Exception: java.lang.RuntimeException: An error occurred while executing doInBackground() at android.os.AsyncTask$3.done + 365(AsyncTask.java:365) at java.util.concurrent.FutureTask.finishCompletion + 383(FutureTask.java:383) at java.util.concurrent.FutureTask.setException + 252(FutureTask.java:252) at java.util.concurrent.FutureTask.run + 271(FutureTask.java:271) at java.util.concurrent.ThreadPoolExecutor.runWorker + 1162(ThreadPoolExecutor.java:1162) at java.util.concurrent.ThreadPoolExecutor$Worker.run + 636(ThreadPoolExecutor.java:636) at java.lang.Thread.run + 784(Thread.java:784)

and

Caused by java.lang.SecurityException: Caller no longer running, last stopped +114ms because: timed out while starting at android.os.Parcel.readException + 1954(Parcel.java:1954) at android.os.Parcel.readException + 1900(Parcel.java:1900) at android.app.job.IJobCallback$Stub$Proxy.dequeueWork + 191(IJobCallback.java:191) at android.app.job.JobParameters.dequeueWork + 196(JobParameters.java:196) at androidx.core.app.JobIntentService$JobServiceEngineImpl.dequeueWork + 315(JobIntentService.java:315) at androidx.core.app.JobIntentService.dequeueWork + 640(JobIntentService.java:640) at androidx.core.app.JobIntentService$CommandProcessor.doInBackground + 390(JobIntentService.java:390) at androidx.core.app.JobIntentService$CommandProcessor.doInBackground + 383(JobIntentService.java:383) at android.os.AsyncTask$2.call + 345(AsyncTask.java:345) at java.util.concurrent.FutureTask.run + 266(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker + 1162(ThreadPoolExecutor.java:1162) at java.util.concurrent.ThreadPoolExecutor$Worker.run + 636(ThreadPoolExecutor.java:636) at java.lang.Thread.run + 784(Thread.java:784)

imagen

This has a workaround that have been applied by evernote, optimizely, instabug and many others, here are the PR and issues: evernote issue evernote fix optimizely fix instabug fix

even a library have been made: Android-SafeJobIntentService

I don't know why this issue is not addressed by google, but they recommend use WorkManager instead. I would really appreciate if you could review this, thanks in advance.

dominicmarmionswrve commented 5 years ago

@HGyllensvard Thanks for the links and further info on this. We certainly will review this again and get back to you.

dominicmarmionswrve commented 5 years ago

@GrinGraz Sorry, I tagged HGyllensvard instead of you in previous post.

HGyllensvard commented 5 years ago

Hi!

Damn I see I was rude here and forgot to reply a while ago! Sincerest apologies for that.

As time went we ended up having bigger things to look into and the business put this in the "if we ever get there pile", guess what? We never got around to it! But we still don't have any other features in the app relying on JobIntentService.

I went back to our data today and can happily say we don't get these crashes in our later versions.

Now Fabric data only goes back 90 days and few people are on these older versions of the app, but looking in our code and comparing the Fabric data we have:

  1. It seems like the issue disappeared after release X, while not the only thing done, we did migrate to AndroidX libraries (released this version back in May this year). Hard to be sure with the lack of Fabric data today though. The crashing JobIntentService being in the support library, if our upgrading forced the usage of the newer version for libraries as well and thus resolving it?
  2. In July we updated Swrve to 6.1.1 and was part or release X + 1.
  3. We somehow unknowingly resolved the issue ourselves around the same time.

@GrinGraz Do you happen to still use the support libraries? And which Swrve version are you using?

dominicmarmionswrve commented 5 years ago

Interesting! Thanks for that! We're planning on migrating to android x also - this is even more incentive to do this sooner rather than later!

In the meantime, we have release 6.3 scheduled to be published very soon, in which we'll add an abstract superclass in the android.support.v4.app package which extends JobIntentService and wraps the dequeueWork method in a try/catch, and all our JobIntentService classes will extend this.

GrinGraz commented 5 years ago

Hi @HGyllensvard, we are already using androidx, updated it in july and in that moment the crash went from the same JobIntentService class to the Parcel class, now we are using the swrve-firebase SDK 6.0.2 version, we set the option to force the use of newer versions too, but nothing changed.

We are going to launch a new version of the app the next week, so i am going to update swrve to see if something happens waiting for the next release of the SDK.

Thank you very much for all your cooperation and brevity to solve this.

HGyllensvard commented 5 years ago

Ah, so I went back to check Fabric this morning and, similar to you, we now have crashes in Parcel class related to JobIntentService instead and these start appearing from the version we upgraded to AndroidX. The number of crashes are low for us which is why we haven't spent time investigating it before.

I don't know if it's just that we have too little data, but the Parcel crashes are exclusively on Android 9 for us.

And @dominicmarmionswrve The fix in the new version sounds great!

dominicmarmionswrve commented 5 years ago

@HGyllensvard @GrinGraz We've released a fix for this today (6.3). Thanks for bringing it to our attention and apologies it took so long!

GrinGraz commented 5 years ago

I am glad to read it!, We are on time to update to 6.3 before our release next week. For us @HGyllensvard the issue is happening on android 8 and above. Again thanks for the effort @dominicmarmionswrve and team.

GrinGraz commented 5 years ago

Hi there!, great news, we released the new version of our app last monday and with around 200.000 users already updated to it, we have not had any issue related to the JobIntentService class, thanks!

dominicmarmionswrve commented 5 years ago

Thats great to hear, thanks for letting us know!