bkaradzic / bgfx

Cross-platform, graphics API agnostic, "Bring Your Own Engine/Framework" style rendering library.
https://bkaradzic.github.io/bgfx/overview.html
BSD 2-Clause "Simplified" License
14.93k stars 1.93k forks source link

Crash on Android #1553

Open jimon opened 5 years ago

jimon commented 5 years ago

On bgfx commit da2720cf936c5ae037a5a9893c446f1379a404b3 I have quite a few crashreports from different Android devices, call stack looks as following:

# Platform: android
# OS Version: 5.1.1
# Device: OPPO R7sm
# RAM Free: 26.2%
# Disk Free: 13.3%

#0. Crashed: Thread
0  libstoker.so                   0x7eef491a bgfx::RenderDraw::clear() (bgfx_p.h:1513)
1  libstoker.so                   0x7ef078e7 ~AllocatorStub (bgfx.cpp:153)
2  libstoker.so                   0x7ef078e7 ~AllocatorStub (bgfx.cpp:153)
3  libstoker.so                   0x7eef64b5 bgfx::Context::init(bgfx::Init const&) (bgfx.cpp:1497)
4  libc.so                        0x401688a0 (Missing)
5  libc.so                        0x4014074b (Missing)
6  libutils.so                    0x401cf085 (Missing)
7  (Missing)                      0x64fba7ca (Missing)
8  (Missing)                      0x6ff46b76 (Missing)
9  libart.so                      0x41daeeeb (Missing)
10 libart.so                      0x41f79789 (Missing)
11 (Missing)                      0x6ff46b76 (Missing)
12 (Missing)                      0x6ff46b76 (Missing)
13 (Missing)                      0x130f873e (Missing)
14 libc.so                        0x40157531 (Missing)
15 (Missing)                      0x6ff46b76 (Missing)
16 libart.so                      0x41f79fad (Missing)
17 libart.so                      0x41daf1b5 (Missing)
18 (Missing)                      0x6f81eb8e (Missing)
19 (Missing)                      0x27694b7e (Missing)
20 (Missing)                      0x1317369e (Missing)
21 (Missing)                      0x6f9df0ee (Missing)
22 (Missing)                      0x1317369e (Missing)
23 (Missing)                      0x6ff461ae (Missing)
24 libart.so                      0x41fb4256 (Missing)
25 libart.so                      0x41fb389a (Missing)
26 libart.so                      0x41fb4256 (Missing)
27 libart.so                      0x41fb4256 (Missing)
28 libart.so                      0x41fb38c2 (Missing)
29 libart.so                      0x41fb389a (Missing)
30 libart.so                      0x41fb4256 (Missing)
31 libart.so                      0x41fb37aa (Missing)
32 libart.so                      0x41fb37be (Missing)
33 (Missing)                      0x12c4a03e (Missing)
34 libutils.so                    0x401cf2af (Missing)
35 (Missing)                      0x12dd5a3e (Missing)
36 (Missing)                      0x6f9ce70e (Missing)
37 libandroid_runtime.so          0x4009cdc3 (Missing)
38 (Missing)                      0x12c4a03e (Missing)
39 system@framework@boot.oat      0x71fc1cdb (Missing)
40 system@framework@boot.oat      0x71f14c77 (Missing)
41 (Missing)                      0x700ec206 (Missing)
42 (Missing)                      0x7006b9d6 (Missing)
43 (Missing)                      0x12c4a03e (Missing)
44 (Missing)                      0x12dd5a3e (Missing)
45 (Missing)                      0x6f9ce70e (Missing)
46 (Missing)                      0x12c4a03e (Missing)
47 system@framework@boot.oat      0x72b967a7 (Missing)
48 (Missing)                      0x6f9ce70e (Missing)
49 (Missing)                      0x131736de (Missing)
50 libart.so                      0x41d75793 (Missing)
51 (Missing)                      0x6f91281e (Missing)
52 (Missing)                      0x131736de (Missing)
53 system@framework@boot.oat      0x72b92197 (Missing)

Anything I can help to troubleshoot this? Seems like I can't reproduce it locally just yet.

bkaradzic commented 5 years ago

That callstack doesn't make sense. If it's during bgfx::init, my guess would be something about memory, but have no other ideas.

attilaz commented 5 years ago

Are the callstack these lines? https://github.com/bkaradzic/bgfx/blob/da2720cf936c5ae037a5a9893c446f1379a404b3/src/bgfx_p.h#L1513 https://github.com/bkaradzic/bgfx/blob/da2720cf936c5ae037a5a9893c446f1379a404b3/src/bgfx.cpp#L1497

It is a 64 bit assignment. It could be an alignment issue. I see now extra alignment for allocating m_encoder. https://github.com/bkaradzic/bgfx/blob/da2720cf936c5ae037a5a9893c446f1379a404b3/src/bgfx.cpp#L1493 If it is not 8 byte aligned I think it can fail in assignment. But maybe I am missing something.

jimon commented 5 years ago

It says it throws SIGBUS 0x00000000b81bb168, and it seems like only unaligned access can throw SIGBUS in this. But seems weird for sure.

attilaz commented 5 years ago

well. that pointer seems to be 8 byte aligned.

bkaradzic commented 5 years ago

Try adding 16-byte alignment for encoder just as test.

jimon commented 5 years ago

So BX_ALLOC calls bx::alloc with alignment 0. And then we have const size_t align = max(_align, sizeof(uint32_t) );;, does it mean that all allocations are 4 byte aligned? If I'm reading this right, that will never fly on Android as they need 8 byte alignment on instructions that operate on 64 bit chunks.


Upd. Could be wrong, ARM docs specify that LDRD will benefit of having 8 byte alignment, but only require 4 byte alignment. Though stack pointers should be 8 byte aligned. I'm digging.

The only mention I can find is here http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0068b/Chdggchb.html

jimon commented 5 years ago

Changed global alignment to 16 instead of 4, and let the build into prerelease testing in google play. Here what went back:

11-18 07:34:50.557: I/art(22930): --------- beginning of crash
11-18 07:34:50.601: A/libc(22930): Fatal signal 7 (SIGBUS), code 1, fault addr 0xb7c37ee8 in tid 23080 (Thread-1994)
11-18 07:34:50.662: I/Robo(22930): copied 8830 bytes
11-18 07:34:50.664: I/Robo(22868): Results ready.
11-18 07:34:50.665: I/Robo(22868): read 413 bytes
11-18 07:34:50.665: I/Robo(22930): copied 687 bytes
11-18 07:34:50.670: D/libc(22930): skt_base:0, kt_base:0, mptcp_enabled:0, socks_enabled:0, wifi_connected:1
11-18 07:34:50.671: D/libc(22930): skt_base:0, kt_base:0, mptcp_enabled:0, socks_enabled:0, wifi_connected:1
11-18 07:34:50.671: D/BandwidthController(305): [LG DATA] No such appUid: 15330
11-18 07:34:50.672: D/DnsProxyListener(305): App 15330 tries DNS query. Accept family:0 protocol:0
11-18 07:34:50.672: I/Robo(22868): read 4 bytes
11-18 07:34:50.673: I/Robo(22868): read 8830 bytes
11-18 07:34:50.698: I/Robo(22868): read 687 bytes
11-18 07:34:50.703: D/libc(22930): skt_base:0, kt_base:0, mptcp_enabled:0, socks_enabled:0, wifi_connected:1
11-18 07:34:50.708: A/DEBUG(307): *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
11-18 07:34:50.708: A/DEBUG(307): Build fingerprint: 'lge/lv0_trf_us/lv0:6.0.1/MXB48T/1705410450712:user/release-keys'
11-18 07:34:50.708: A/DEBUG(307): Revision: '0'
11-18 07:34:50.708: A/DEBUG(307): ABI: 'arm'
11-18 07:34:50.709: A/DEBUG(307): pid: 22930, tid: 23080, name: Thread-1994  >>> com.beardsvibe.stoker <<<
11-18 07:34:50.709: A/DEBUG(307): signal 7 (SIGBUS), code 1 (BUS_ADRALN), fault addr 0xb7c37ee8
11-18 07:34:50.715: I/Robo(22868): newScreenState.hasOpaqueElements() = false
11-18 07:34:50.715: I/Robo(22868): New Screen: Optional.of(ScreenNode {Id=1, PackageName=com.beardsvibe.stoker, ActivityName=Optional.of(com.beardsvibe.EntrypointNativeActivity)})
11-18 07:34:50.725: I/Robo(22868): Experiment excludeWebView: false
11-18 07:34:50.741: A/DEBUG(307):     r0 b7c37e88  r1 00000000  r2 b7c37ee8  r3 ffff0000
11-18 07:34:50.742: A/DEBUG(307):     r4 b7c37e88  r5 b7c38040  r6 b7ab5bb0  r7 9c9d0809
11-18 07:34:50.742: A/DEBUG(307):     r8 9e756b20  r9 9bed6878  sl 9e756b20  fp 9bba7144
11-18 07:34:50.742: A/DEBUG(307):     ip ff010000  sp 9bed66f0  lr 9c9bfdfb  pc 9c9bd83a  cpsr 000e0030
11-18 07:34:50.744: A/DEBUG(307): backtrace:
11-18 07:34:50.744: A/DEBUG(307):     #00 pc 001a183a  /data/app/com.beardsvibe.stoker-1/lib/arm/libstoker.so (_ZN4bgfx11EncoderImpl7discardEv+89)

and another instance

11-18 07:35:03.971: I/DEBUG(1925): *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
11-18 07:35:03.971: I/DEBUG(1925): Build fingerprint: 'samsung/m0xx/m0:4.3/JSS15J/I9300XXUGMK6:user/release-keys'
11-18 07:35:03.971: I/DEBUG(1925): Revision: '12'
11-18 07:35:03.971: I/DEBUG(1925): pid: 13414, tid: 13457, name: Main Thread  >>> com.beardsvibe.stoker <<<
11-18 07:35:03.971: I/DEBUG(1925): signal 7 (SIGBUS), code 128 (SI_KERNEL), fault addr 00000000
11-18 07:35:03.991: W/LicenseLogService(2343): log() is called by non admin
11-18 07:35:04.011: D/dalvikvm(6777): WAIT_FOR_CONCURRENT_GC blocked 4ms
11-18 07:35:04.081: D/dalvikvm(6777): GC_EXPLICIT freed 157K, 24% free 11981K/15656K, paused 10ms+6ms, total 62ms
11-18 07:35:04.171: D/dalvikvm(6811): JIT code cache reset delayed (1048492 bytes 2/14)
11-18 07:35:04.176: D/dalvikvm(6811): GC_CONCURRENT freed 1837K, 25% free 15147K/20180K, paused 4ms+6ms, total 62ms
11-18 07:35:04.221: W/LicenseLogService(2343): log() is called by non admin
11-18 07:35:04.231: W/ModuleInitIntentOp(6941): Dropping unexpected action com.google.android.gms.phenotype.COMMITTED
11-18 07:35:04.246: I/PTCommittedOperation(6811): Receive new configuration for com.google.android.gms.fitness
11-18 07:35:04.266: W/LicenseLogService(2343): log() is called by non admin
11-18 07:35:04.341: I/DEBUG(1925):     r0 598cbf98  r1 00000000  r2 598cbff8  r3 ffff0000
11-18 07:35:04.341: I/DEBUG(1925):     r4 598cbf98  r5 598cc150  r6 598c4d40  r7 623c1809
11-18 07:35:04.341: I/DEBUG(1925):     r8 64147b20  r9 6499ce48  sl 64147b20  fp 673a1144
11-18 07:35:04.341: I/DEBUG(1925):     ip ff010000  sp 6499ccc0  lr 623b0dfb  pc 623ae83a  cpsr 00000030
11-18 07:35:04.341: I/DEBUG(1925):     d0  3f80000000000000  d1  7149f2ca3e000000
11-18 07:35:04.341: I/DEBUG(1925):     d2  000000003f800000  d3  0037003600350034
11-18 07:35:04.341: I/DEBUG(1925):     d4  002b002a00290028  d5  002f002e002d002c
11-18 07:35:04.341: I/DEBUG(1925):     d6  0000000000210020  d7  000000003f800000
11-18 07:35:04.341: I/DEBUG(1925):     d8  000b000a00090008  d9  000f000e000d000c
11-18 07:35:04.341: I/DEBUG(1925):     d10 00cb00ca00c900c8  d11 00cf00ce00cd00cc
11-18 07:35:04.341: I/DEBUG(1925):     d12 00c300c200c100c0  d13 00c700c600c500c4
11-18 07:35:04.341: I/DEBUG(1925):     d14 00bb00ba00b900b8  d15 00bf00be00bd00bc
11-18 07:35:04.341: I/DEBUG(1925):     d16 ffffffff00000000  d17 00000000ffffffff
11-18 07:35:04.341: I/DEBUG(1925):     d18 010000500000001f  d19 0000000000000000
11-18 07:35:04.341: I/DEBUG(1925):     d20 0000000000000000  d21 0000000000000000
11-18 07:35:04.341: I/DEBUG(1925):     d22 0000000000000000  d23 3f80000000000000
11-18 07:35:04.341: I/DEBUG(1925):     d24 005b005a00590058  d25 005f005e005d005c
11-18 07:35:04.341: I/DEBUG(1925):     d26 0053005200510050  d27 0057005600550054
11-18 07:35:04.341: I/DEBUG(1925):     d28 004b004a00490048  d29 004f004e004d004c
11-18 07:35:04.341: I/DEBUG(1925):     d30 0043004200410040  d31 0047004600450044
11-18 07:35:04.341: I/DEBUG(1925):     scr 20000010
11-18 07:35:04.341: D/dalvikvm(13393): GC_FOR_ALLOC freed 657K, 15% free 10495K/12224K, paused 22ms, total 22ms
11-18 07:35:04.341: I/dalvikvm-heap(13393): Grow heap (frag case) to 14.807MB for 3686416-byte allocation
11-18 07:35:04.341: I/DEBUG(1925): backtrace:
11-18 07:35:04.341: I/DEBUG(1925):     #00  pc 001a183a  /data/app-lib/com.beardsvibe.stoker-1/libstoker.so (bgfx::EncoderImpl::discard()+89)
11-18 07:35:04.341: I/DEBUG(1925):     #01  pc 001a3df7  /data/app-lib/com.beardsvibe.stoker-1/libstoker.so (bgfx::EncoderImpl::EncoderImpl()+370)
11-18 07:35:04.341: I/DEBUG(1925): stack:
11-18 07:35:04.341: I/DEBUG(1925):          6499cc80  00000001  
11-18 07:35:04.341: I/DEBUG(1925):          6499cc84  40106ae9  /system/lib/libc.so (realloc+12)
11-18 07:35:04.341: I/DEBUG(1925):          6499cc88  00000080  
11-18 07:35:04.341: I/DEBUG(1925):          6499cc8c  90110484  
11-18 07:35:04.341: I/DEBUG(1925):          6499cc90  0000008b  
11-18 07:35:04.341: I/DEBUG(1925):          6499cc94  598cbf98  
11-18 07:35:04.341: I/DEBUG(1925):          6499cc98  598cc150  
11-18 07:35:04.341: I/DEBUG(1925):          6499cc9c  598c4d18  
11-18 07:35:04.341: I/DEBUG(1925):          6499cca0  623c1809  /data/app-lib/com.beardsvibe.stoker-1/libstoker.so (bgfx::AllocatorStub::realloc(void*, unsigned int, unsigned int, char const*, unsigned int))
11-18 07:35:04.341: I/DEBUG(1925):          6499cca4  64147b20  
11-18 07:35:04.341: I/DEBUG(1925):          6499cca8  598c4d40  
11-18 07:35:04.341: I/DEBUG(1925):          6499ccac  598cbf98  
11-18 07:35:04.341: I/DEBUG(1925):          6499ccb0  598cc150  
11-18 07:35:04.341: I/DEBUG(1925):          6499ccb4  598c4d40  
11-18 07:35:04.346: I/DEBUG(1925):          6499ccb8  df0027ad  
11-18 07:35:04.346: I/DEBUG(1925):          6499ccbc  00000000  
11-18 07:35:04.346: I/DEBUG(1925):     #00  6499ccc0  00000000  
11-18 07:35:04.346: I/DEBUG(1925):          ........  ........
11-18 07:35:04.346: I/DEBUG(1925):     #01  6499ccc0  00000000  
11-18 07:35:04.346: I/DEBUG(1925):          6499ccc4  00000000  
11-18 07:35:04.346: I/DEBUG(1925):          6499ccc8  00000008  
11-18 07:35:04.346: I/DEBUG(1925):          6499cccc  00000000  
11-18 07:35:04.346: I/DEBUG(1925):          6499ccd0  623c1809  /data/app-lib/com.beardsvibe.stoker-1/libstoker.so (bgfx::AllocatorStub::realloc(void*, unsigned int, unsigned int, char const*, unsigned int))
11-18 07:35:04.346: I/DEBUG(1925):          6499ccd4  623c1809  /data/app-lib/com.beardsvibe.stoker-1/libstoker.so (bgfx::AllocatorStub::realloc(void*, unsigned int, unsigned int, char const*, unsigned int))
11-18 07:35:04.346: I/DEBUG(1925):          6499ccd8  6499d040  
11-18 07:35:04.346: I/DEBUG(1925):          6499ccdc  623b03d7  /data/app-lib/com.beardsvibe.stoker-1/libstoker.so (bgfx::Context::init(bgfx::Init const&)+1878)
11-18 07:35:04.346: I/DEBUG(1925):          6499cce0  00000000  
11-18 07:35:04.346: I/DEBUG(1925):          6499cce4  00000000  
11-18 07:35:04.346: I/DEBUG(1925):          6499cce8  008d008c  
11-18 07:35:04.346: I/DEBUG(1925):          6499ccec  008f008e  
11-18 07:35:04.346: I/DEBUG(1925):          6499ccf0  00990098  
11-18 07:35:04.346: I/DEBUG(1925):          6499ccf4  009b009a  
11-18 07:35:04.346: I/DEBUG(1925):          6499ccf8  673a1140  
11-18 07:35:04.346: I/DEBUG(1925):          6499ccfc  64147c80  

I'm still trying to figure out what is going on :(

jimon commented 5 years ago

Trying now to disable all struct alignment completely by turning off BX_ALIGN_DECL :/

jimon commented 5 years ago

image oh, now instead of crashing, it reverts colors on Galaxy S3 ... 💯

attilaz commented 5 years ago

Removing BX_ALIGN_DECL fixes the crash? Well that is strange. Codegen assumes 64(BX_CACHE_LINE_SIZE) byte alignment and code fails if not?

New crash looks different though. It crashes in realloc with fault addr 00000000. Which compiler/ndk/libc are you using?

Revert colors is an rgba vs bgra mismatch?

jimon commented 5 years ago

I have a feeling that telling compiler to align struct to cache line (64) and then allocate memory with alignment to 4 bytes (I even made it 16 bytes) is the problem. But I have no concrete proof as I was not able to reproduce it locally.

I've shipped a version with just BX_CACHE_LINE_SIZE align disabled (and others still enabled) to live, let's see if crash reports stop. I'm getting now ~25% all gameplay session are crashing.

I'm using building with sdk 27 (and minSdkVersion is 16), while ndk is r18b. They removed GCC all together so now it's only clang and c++_static.

As for reverted colors, I think this is just a bug in google play testing, as all other devices so far have valid colors.

jimon commented 5 years ago

Ok, it safe to say that live test confirms that removing BX_ALIGN_DECL_CACHE_LINE fixed the issue, as crash disappeared from crashlytics. Would be interesting to figure out why exactly that was a case @bkaradzic

neslib commented 5 years ago

I also get this error, and disabling BX_ALIGN_DECL_CACHE_LINE seems to work for me as well. Maybe it is time to look at this a bit closer and add a fix to BX?