Shopify / react-native-skia

High-performance React Native Graphics using Skia
https://shopify.github.io/react-native-skia
MIT License
6.94k stars 448 forks source link

SIGTRAP Android freeze #1982

Open LeviWilliams opened 11 months ago

LeviWilliams commented 11 months ago

Description

Recently we upgraded our Skia version from 0.1.197 to 0.1.214 and now we are seeing a bunch of "SIGTRAP: Trace/breakpoint trap" in production on Android devices with a variety of versions. Currently on react-native 0.72.3 if helpful. Screenshot 2023-11-13 at 3 23 19 PM.

We reproduced a couple times on a Pixel 6 and the app just freezes. I understand this error is vague, we currently have no leads as to why this happens after updating the lib though if we are able to provide a repro we will.

Let us know if there are any ideas on what we can try, thanks for the help as always.

Version

0.1.214

Steps to reproduce

-

Snack, code example, screenshot, or link to a repository

-

wcandillon commented 11 months ago

We would definitely need a reproducible example and also a sense of the APIs which are used (Reanimated version, Skia animations, etc)

laurens-lamberts commented 10 months ago

Hi @wcandillon, @LeviWilliams,

We also experience this crash in production. So much - unfortunately - that we received a warning from Google regarding 'Android vitals bad behavior'. Our app will become less discoverable and receive a warning at the store page if this crash is not resolved soon.

We currently have no clue when/where specifically this crash occurs. Therefore we cannot provide a reproducible example at the moment. For users that experience the crash, we do notice that this crash is experienced only once every app-update. Next sessions are not affected most of the times.

For our app, in the last 7 days 1.5k users experienced 1.7k crashes of the SIGTRAP type, originated from librnskia.so.

Two crashes occur, both indicated by SIGTRAP, both about 50% of the total occurrences;

[split_config.arm64_v8a.apk!librnskia.so] SkTDPQueue<GrGpuResource*, &GrResourceCache::CompareTimestamp(GrGpuResource* const&, GrGpuResource* const&), &GrResourceCache::AccessResourceIndex(GrGpuResource* const&)>::remove(GrGpuResource*)

and

[split_config.arm64_v8a.apk!librnskia.so] GrResourceCache::notifyARefCntReachedZero(GrGpuResource*, GrIORef<GrGpuResource>::LastRemovedRef)
SIGTRAP

Some more details

100% foreground crashes, spreaded to usage over Android versions and devices.

Versions used in the context of the above; Skia: 0.1.221 Reanimated: 3.5.4 React-native: 0.72.7

With Skia 0.1.210: No significant crashes With Skia 0.1.214: many SIGSEGV crashes, and also some SIGTRAP crashes With Skia 0.1.221: many SIGTRAP crashes, no more SIGSEGV crashes.

Hope this helps tracking down the issue.

wcandillon commented 10 months ago

@laurens-lamberts Thank you for this precious data.And I hope that we can get this sorted out as soon as possible. I need to review things more carefully on my side but from 0.1.210 to 0.1.214, the only update that I am seeing on the native code is the Skia version upgrade. I could do another upgrade to see if this helps. (alternatively we could downgrade as well to see if this helps).

The error seems to be Skia specific, I will investigate this a bit deeper and let you know if I find anything.

wcandillon commented 10 months ago

Strangely enough, I cannot find any relevant change from 0.1.214 to 0.1.220.

Going forward, we will setup some of release program to check if we introduce such regressions to releases.

We have a RN Skia client of approximately the same scale as you running 0.1.213 (has no crash reports). I will contact them about the issue and see if there is a way to maybe just try to upgrade to m119 in an isolated manner. This client is not using any of the recently deprecated APIs. The holidays may make things a bit slower there but I will report back.

@laurens-lamberts I suggest we do the following:

laurens-lamberts commented 10 months ago

Thanks a lot @wcandillon, we're really happy with your support proposal on this issue. It means a lot to us, and we are motivated to help tracking down the issue.

Due to the high impact when issues arise during deployment in the christmas / newyear period, we will postpone next releases to January. For the upcoming release of our project we upgraded to the following library versions (all latest);

"react-native": "0.73.1",
"react-native-reanimated": "3.6.1",
"@shopify/react-native-skia": "0.1.230",

If any new versions of the above packages appear before our release, we will update to ensure having the latest of all.

We always perform our releases phased, so as soon as we got insight in crash rates we will share them with you. This will likely be the end of January / beginning of Februari.

For my information, where in the react-native-skia library do I find the reference to the internal Skia version number (like m121)?

wcandillon commented 10 months ago

This is were you can find the Skia version used : https://github.com/Shopify/react-native-skia/blob/main/.gitmodules In the built package, I don't believe this information is available, that something we could do potentially if you would find it useful.

I will continue to investigate this a bit and also do the upgrade to m121 and we can tackle this more aggressively after the holidays. I think that we are lucky to have a Skia client that uses 113 at scale but with only non-deprecated API, that will give us a lot of information once/when they deploy 114 and above.

laurens-lamberts commented 10 months ago

Yes, that's great. Looking forward to hearing their experiences with later versions. Thanks for showing me where to find the Skia version used. Maybe we can use that in combination with the release notes of skia to troubleshoot some issues in the future.

espenjanson commented 9 months ago

Any updates on this/ways to resolve it @laurens-lamberts @wcandillon @LeviWilliams ? Anything we can do to help? We just had to downgrade Skia to 196 because of thousands of crashes in production due to this error. Would be awesome to be able to upgrade since we want to move on to RN 0.73 (which according to release notes are not fully supported until 213) 🙏

wcandillon commented 9 months ago

@espenjanson Yes anything that would help to reproduce the issue or more details on the conditions of the crash would be extremely useful. I'm surprised you are on 196 because 197 notoriously fixes a crash related to animations.

We have been coordinating with @laurens-lamberts to find the root cause of the issue but without success yet. We have a large client who's running the latest version of Skia without any crashes (this same client had a large amount of crashes in production with 196). This means that the issue is likely related to a particular API but we haven't been able to identify it yet.

wcandillon commented 9 months ago

@espenjanson could you send me a list of Skia APIs and components you are using? You can do it privately as well by email.

espenjanson commented 9 months ago

@wcandillon thanks for quick response. We'd love to help in any way we can. Any chance you could provide the package.json (or at least parts of it, such as react-native and reanimated version and perhaps other libraries that could affect skia)?

If you want to, we can send you a minimal functioning app project with our crashing package.json and all the components we have that use Skia. Can put together a zip or a repo, whatever works better for you. If needed, we can also provide more detailed stack traces from Google Play and Sentry.

Will put the team on this immediately. Thanks a million for paying attention to this!

Nodonisko commented 5 months ago

@wcandillon We have exactly same issue with quite big number of crashes with exact same error messages that started to appear after we updated Skia in January 2024.

[split_config.arm64_v8a.apk!librnskia.so] GrResourceCache::notifyARefCntReachedZero(GrGpuResource*, GrIORef<GrGpuResource>::LastRemovedRef)
SIGTRAP
[split_config.arm64_v8a.apk!librnskia.so] SkTDPQueue<GrGpuResource*, &GrResourceCache::CompareTimestamp(GrGpuResource* const&, GrGpuResource* const&), &GrResourceCache::AccessResourceIndex(GrGpuResource* const&)>::remove(GrGpuResource*)
SIGTRAP

My guess it's not possible to create 100% reproducible example because crash is quite random, but it's probably related to Skia + Animations. I will try to some app that uses same features as our production app and that will run that animations in some forever loop and hope it will crash after some time. Also I will try to just mount and unmount our components very quickly in some forever loop.

I will let you know if I will find something.

wcandillon commented 5 months ago

@Nodonisko is this on the latest version? It looks like this may have been fixed after the latest Skia version upgrade.

laurens-lamberts commented 5 months ago

Hi @wcandillon,

We are live on 1.1.0 of react-native-skia and still experience the crash. Is this already using the latest Skia version? 80% of our Android crashes are the SIGTRAP one from librnskia.so, and it drops our crash-free rate to 98.23 at the moment. iOS is very stable. 99.91% crash-free for us.

wcandillon commented 5 months ago

I'm slowly formulating a plan to tackle this issue. As long as we cannot reproduce the issue, this would require us to deploy an unreleased version of RN Skia to a segment of users to see it solves the issue or not. Would this be reasonable?

Is there a sense of which screen the crash is happening? This would allow us to funnel the API/code that might be faulty.

Nodonisko commented 5 months ago

@wcandillon We are not at latest version we have like two months old version. We will it but it will take us another month to test it in production.

In mean time we were quite lucky and one of our testers managed to catch crash on video. It's not much helpful but at least we know on which screen it's happening. Sadly it's screen full of Skia components and animations :D

I will try to prepare some standalone app from that screen so we can try to reproduce it in more isolated env. Hope I will have this done today or tmrw.

https://github.com/Shopify/react-native-skia/assets/5837757/6ff2b12f-9c2a-4af3-a1ff-75cd84957560

wcandillon commented 5 months ago

nice, is this happening in debug mode? is there more details maybe when the crash happens?

On Tue, May 28, 2024 at 1:57 PM Daniel Suchý @.***> wrote:

@wcandillon We are not at latest version we have like two months old version. We will it but it will take us another month to test it in production.

In mean time we were quite lucky and one of our testers managed to catch crash on video. It's not much helpful but at least we know on which screen it's happening. Sadly it's screen full of Skia components and animations :D

I will try to prepare some standalone app from that screen so we can try to reproduce it in more isolated env. Hope I will have this done today or tmrw.

https://github.com/Shopify/react-native-skia/assets/5837757/6ff2b12f-9c2a-4af3-a1ff-75cd84957560

— Reply to this email directly, view it on GitHub or unsubscribe. You are receiving this email because you were mentioned.

Triage notifications on the go with GitHub Mobile for iOS or Android.

Nodonisko commented 5 months ago

Sadly it's production version.

Nodonisko commented 5 months ago

So I spend most of yesterday trying to reproduce the issue. I created special version of our homescreen that runs animations in loop, mounting unmounting components etc. and let it run on two different Android devices for 30 minutes few times. I also tried both production and debug builds. So far I did not get single crash...

I also noticed that there is one single crash in our GPlay console that is happening with nearly same signature [libhwui.so] GrResourceCache::notifyARefCntReachedZero(GrGpuResource*, GrIORef<GrGpuResource>::LastRemovedRef) and it's not SIGTRAP but it's SIGSEGV and it actually has some stacktrace:

*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
pid: 0, tid: 16686 >>> io.trezor.suite <<<

backtrace:
  #00  pc 0x0000000000558aac  /system/lib64/libhwui.so (GrResourceCache::notifyARefCntReachedZero(GrGpuResource*, GrIORef<GrGpuResource>::LastRemovedRef)+364)
  #01  pc 0x0000000000562074  /system/lib64/libhwui.so (GrTextureProxy::~GrTextureProxy()+96)
  #02  pc 0x000000000056220c  /system/lib64/libhwui.so (virtual thunk to GrTextureProxy::~GrTextureProxy()+40)
  #03  pc 0x0000000000686fdc  /system/lib64/libhwui.so (SkImage_Gpu::~SkImage_Gpu()+24)
  #04  pc 0x0000000000299924  /system/lib64/libhwui.so (android::uirenderer::AutoBackendTextureRelease::unref(bool)+108)
  #05  pc 0x000000000029a110  /system/lib64/libhwui.so (android::uirenderer::DeferredLayerUpdater::destroyLayer()+188)
  #06  pc 0x000000000029ac14  /system/lib64/libhwui.so (android::uirenderer::DeferredLayerUpdater::detachSurfaceTexture()+28)
  #07  pc 0x0000000000284014  /system/lib64/libhwui.so (std::__1::__function::__func<decltype(fp()) android::uirenderer::WorkQueue::runSync<android::uirenderer::renderthread::RenderProxy::dumpProfileInfo(int, int)::$_28>(android::uirenderer::renderthread::RenderProxy::dumpProfileInfo(int, int)::$_28&&)::'lambda'(), std::__1::allocator<decltype(fp()) android::uirenderer::WorkQueue::runSync<android::uirenderer::renderthread::RenderProxy::dumpProfileInfo(int, int)::$_28>(android::uirenderer::renderthread::RenderProxy::dumpProfileInfo(int, int)::$_28&&)::'lambda'()>, void ()>::operator()() (.2a0230ca9784b3ed733f337e97c21a2e)+92)
  #08  pc 0x0000000000274bac  /system/lib64/libhwui.so (android::uirenderer::WorkQueue::process()+588)
  #09  pc 0x00000000002951ac  /system/lib64/libhwui.so (android::uirenderer::renderthread::RenderThread::threadLoop()+416)
  #10  pc 0x0000000000013414  /system/lib64/libutils.so (android::Thread::_threadLoop(void*)+424)
  #11  pc 0x00000000000ba598  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+208)
  #12  pc 0x0000000000053f3c  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+68)

I am not sure if it's related or it's some completely unrelated random crash with similar signature, but it's suspicious.

Any ideas what we can try next? We can try to deploy some special version of Skia from @wcandillon if that could help in our next release, but I will take another month to roll this out to our users.

I will also try to narrow in which version crash occurred first.

Nodonisko commented 5 months ago

So it seems that crash occurred after update from 0.1.188 to 0.1.216 according to our Sentry.

Also found this issue in Skia issue trackers https://issues.skia.org/issues/333423686

wcandillon commented 4 months ago

@Nodonisko Does https://issues.skia.org/issues/333423686 look like it could be related? It wasn't clear to me looking the bug report

@Nodonisko @laurens-lamberts @LeviWilliams I would like to find some way to reproduce the issue (even in release mode) and/or pin down the scenario in which the error happen. I wrote this example that stresses the Skia APIs and I tested also in release mode: https://github.com/user-attachments/files/15758252/StressTest.zip Please let me know how I could update the test scenario to better match your circumstances.

In #2396, we are experiencing a clear race condition which we are currently investigating and that might shed some light on what is happening.

Nodonisko commented 4 months ago

@wcandillon Error in https://issues.skia.org/issues/333423686 looks very similar to our error but not sure it's related.

About stress test, I run it on my two devices in debug mode and no crash so far.

I also went through Sentry data and it seems it crashes very often (not exclusively) when user goes to new screen, both when previous screen is unmounted (like tab change) or also if new screen is pushed into stack, which leads me to idea that this probably happens when some Skia component is mounted.

Nodonisko commented 3 months ago

We just released new version of app with 1.3.7 Skia version and this issue still persist. My colleague just got this crash when he did some hover gesture over our graph (Revolut style graph animation).

alexnaiman commented 1 week ago

Hello @wcandillon @Nodonisko ! Any updates here?

We're still experiencing both issues on react-native-skia version 1.3.9.

[split_config.arm64_v8a.apk!librnskia.so] SkTDPQueue<GrGpuResource*, &GrResourceCache::CompareTimestamp(GrGpuResource* const&, GrGpuResource* const&), &GrResourceCache::AccessResourceIndex(GrGpuResource* const&)>::remove(GrGpuResource*)
[split_config.arm64_v8a.apk!librnskia.so] GrResourceCache::notifyARefCntReachedZero(GrGpuResource*, GrIORef<GrGpuResource>::LastRemovedRef)

Any workaround or solution to help us mitigate these would be greatly appreciated, as we’re currently seeing hundreds of crashes related to both problems

Could we consider downgrading to version 196/210? @espenjanson /@laurens-lamberts mentioned in this thread that it helped them, though I don’t see this as a sustainable long-term fix, even if it does work. (Also, @espenjanson / @laurens-lamberts , did it actually resolve the issues? Have you encountered any more problems since?)

other relevant information:

"react-native": "0.74.2",
"react-native-reanimated": "3.12.1",

Skia API used: These APIs were used even before we started seeing the crashes the play console

These APIs were used on the latest release, the one where we started seeing the crashes. We created a new component that is used on several lists (FlashList, FlatList, SectionList). Each list has somewhere around 10-50 items