GoogleCloudPlatform / recaptcha-enterprise-mobile-sdk

Apache License 2.0
31 stars 6 forks source link

Random crashes on iOS after calling `execute` method #38

Closed ihnatmoisieiev closed 1 year ago

ihnatmoisieiev commented 1 year ago

Describe the bug

There are random crashes in X seconds after calling recaptchaClient.execute(RecaptchaAction(action: action)).

Integration Method

Select the method used to integrate with reCAPTCHA Mobile.

SDK Version (e.g. 18.0.1): 18.0.1, 18.0.3, 18.1.0

To Reproduce

Steps to reproduce the behavior:

  1. Call recaptchaClient.execute(RecaptchaAction(action: action))
  2. Interact with UITextField in some way: open keyboard, write some text, close keyboard, open keyboard and write again, etc.
  3. Optional suspend and return to the app after some short time
  4. Optional call recaptchaClient.execute(RecaptchaAction(action: action)) again
  5. Optional interact with UITextField again
  6. Optional Move to another screen in the app
  7. Wait X seconds
  8. Observe the crash

The order of 3-6 steps could be mixed

Expected behavior

Eliminate all crashes produced by the framework

Screenshots

malloc: Heap corruption detected, free list is damaged at 0x282a948f0
*** Incorrect guard value: 134186500114432
 EXC_BAD_ACCESS: hash >
Attempted to dereference garbage pointer 0x1de9bf0.
 EXC_BAD_ACCESS
isEqual: >
Attempted to dereference garbage pointer 0x400000020.

Xcode version for iOS (please complete the following information):

Device (please complete the following information):

Additional context

It's randomly reproduced in X seconds after calling execute() method. We are assuming there are some bugs somewhere in the Objective-C code of the framework, eg. pointers. Could be connected to: https://github.com/GoogleCloudPlatform/recaptcha-enterprise-mobile-sdk/issues/22

@mcorner please take a look and advice us how to eliminate crashes or when we can expect a new version of the framework which fixes such issues?

mcorner commented 1 year ago

The SDK does not interact with the UITextField or the keyboard. Those things may create enough memory churn to somehow help trigger the issue, but as of now we have not been able to reproduce it in Instruments.

It sounds like you have replicated it locally. Can I ask you: a) how often does it occur? Or how quickly can you make it happen? and b) if you have a sample project that seems to have the right combination of things, would you be willing to share it? Then we can put Instruments on it again and see if we can repro it here.

mcorner commented 1 year ago

Also when you say "Quit the app" do you mean force quit? Or suspend the application in the background?

ihnatmoisieiev commented 1 year ago

The SDK does not interact with the UITextField or the keyboard.

We don't know it, so it's just our observation when crashes appear. UITextField and keyboard may not influence at all on it, but we are sure that it appears randomly after X seconds after calling execute method.

It sounds like you have replicated it locally.

We reproduced it on TF builds, but I assume that we will have something similar as @SurglogsGithubUser here after changing build config to release.

a) how often does it occur?

In every app session after the execute method has been called

Or how quickly can you make it happen?

~15 seconds - 5 minutes after the above method was called

b) if you have a sample project that seems to have the right combination of things, would you be willing to share it?

Unfortunately, we don't have a sample project. I can share a screen recording of the crash with you privately if it may help.

Also when you say "Quit the app" do you mean force quit?

Nope, I mean just suspend and return to the app after some short time, ~ max 3 mins.

mcorner commented 1 year ago

Thus far no luck reproducing this on a real device, also using a release build. We haven't seen the crash so far and AddressSanitizer isn't yielding any clues thus far.

Have you run AddressSanitizer?

Obviously, without a clear repro, we will have trouble finding it here.

ihnatmoisieiev commented 1 year ago

@mcorner

No, we haven't run AddressSanitizer yet.

We discovered interesting information yesterday that might be useful for you: we were not able to reproduce such behavior on the 18.0.2 version of the framework installed via SPM. But the issue unfortunately still occurs on 18.0.1, 18.0.3, 18.1.0. Additional info: we were not able to install 18.0.2 via direct download xcframework due to compiling errors, attaching below:

Screenshot 2023-03-07 at 08 31 37
mcorner commented 1 year ago

That means you are missing protobuf. Starting with v18.0.3 we embed protobuf in the SDK. In v18.0.2 it was a separate direct download dependency that needed to be included. SPM and pods did it for you.

This might give us a clue that it is something in a particular part of our code. We will keep digging.

ihnatmoisieiev commented 1 year ago

@mcorner we followed the integration guide from here. Nothing was mentioned about "protobuf". We installed via SMP and via direct download xcframework. v18.0.1 also doesn't work for us.

Could you write more about "protobuf"? What is it and what should we do? Thank you!

mcorner commented 1 year ago

This compilation problem is a separate issue, please open a new one so we can keep this one focused on the original one.

jacobocl commented 1 year ago

@mcorner sure, I will be glad to try out that private build! I will send you an email right now.

mcorner commented 1 year ago

Many thanks to @jacobocl who has tested a beta build of the fix. No more crash, so I am cautiously optimistic. That said, the heap corruption problems can be slippery and move around based on minor changes. We will include this in the next release (should be within a week or two).

ihnatmoisieiev commented 1 year ago

Great news @mcorner Can't wait for the next release. Thanks for the updates!

mcorner commented 1 year ago

Ok....finally! ios 18.1.2 just went out which hopefully fixes this issue. Official release notes will be up soon, but in the meantime:

There will be an Android release real soon.

Keep in mind that the SDK is only part of the equation. There are a large number of enhancements that are rolling out on the server side over the next month. The vast majority of issues we see are related to poor networking conditions (much more typical on Android phones that are used more globally). We have a long series of improvements to help address those situations and are committed to having reCAPTCHA work for every customer globally.