firebase / quickstart-unity

Firebase Quickstart Samples for Unity
https://firebase.google.com/games
Apache License 2.0
828 stars 428 forks source link

Intermittent crashes when saving data to Firestore #1303

Closed khambadkone closed 1 year ago

khambadkone commented 1 year ago

[REQUIRED] Please fill in the following fields:

[REQUIRED] Please describe the question here:

Using a combination of Firebase Authenticate and Firestore, we save player information as Documents to a collection on Firestore. This data is a set of key-value data pairs, indicating player progress and the resulting document hierarchy is flat.

The game uses Firebase 8.8.0 with Unity 2020.3.32f1. The Unity code that saves this data is :-

public override void Save<T>(T saveData, OnSaveProgress onSaveComplete)
        {
            Logger.Log($"FireStoreCloudSave saving... {id}");
            GetDocumentReference().SetAsync(saveData).ContinueWith(t =>
            {
                if (t.Exception != null)
                {
                    Logger.Log(t.Exception.ToString());
                }
                else
                {
                    Logger.Log($"save complete {id}");
                }
            });
            onSaveComplete?.Invoke(Type.FIRESTORE);
        }

saveData is a class

[System.Serializable]
public class GameSaveData : SerializableDictionary { }

The code works for 90% of users, but we see the following two crashes in Crashyltics to the remaining 8-10% users :-

Crashlog 1:

(multiple libbase.so Missing UUID xxxxxxxx)
8 libil2cpp.so FirestoreCpp.cs ConvertMapToFieldValue
9 libil2cpp.so ConverterBase.cs ConvertToProxyMap
10 libil2cpp.so DocumentReference.cs SetAsync
11 libil2cpp.so GenericMethods5.cpp - Line 40493 FireStoreCloudSave_Save_TisRuntimeObject_m3EFAD2C4096136226A4FF17AE9A907926CB9744E_gshared + 40493
12 libil2cpp.so GenericMethods6.cpp - Line 89 SaveGameManager_Save_TisRuntimeObject_mD14C102D68CE7A760FC8F9AAE2A83E1C545137A8_gshared + 89

Crashlog 2 :

(multiple libbase.so Missing UUID xxxxxxxx)
8 libil2cpp.so FirestoreCpp.cs DocumentReferenceSetAsync
9 libil2cpp.so GenericMethods5.cpp - Line 40493 FireStoreCloudSave_Save_TisRuntimeObject_m3EFAD2C4096136226A4FF17AE9A907926CB9744E_gshared + 40493
10 libil2cpp.so GenericMethods6.cpp - Line 89 SaveGameManager_Save_TisRuntimeObject_mD14C102D68CE7A760FC8F9AAE2A83E1C545137A8_gshared + 89

On checking our logs and Firestore, we see data available for these users.

Am looking to understand what causes these errors in Crashlytics and the behaviour of Firestore when they occur ?

Crashlog 1 : Is this due to an issue converting a non-compliant data type in Firestore ? FieldValue documentation mentions unsupported types can cause a crash. If so, how can we identify the problem elements in Unity ? Why do we see data in FireStore despite these crashes ?

Crashlog 2 : What can cause a crash on SetAsync ? Is this due to the 1 second limit on saving documents ?

google-oss-bot commented 1 year ago

This issue does not seem to follow the issue template. Make sure you provide all the required information.

khambadkone commented 1 year ago

This issue does not seem to follow the issue template. Make sure you provide all the required information.

Updated.

paulinon commented 1 year ago

Hi @khambadkone,

Thanks for reporting this issue. I can bring this up to the team so that we can identify the causes of these crashes. For now, could you share the details (such as model and OS version) of the devices encountering the crashes?

Additionally, could you confirm if version 9.6.0 reduces that crashes you're facing?

dconeybe commented 1 year ago

This looks like a known issue that just came to our attention last week. If it's the issue I think it is, then it is specific to Android. Googlers can see b/251869890 for more details.

Basically, Firestore's C++ JNI layer creates too many global references and eventually exhausts the 51200 global ref limit, which causes the app to crash. It is a long-standing bug but only now is starting to get noticed. I'm working on a fix, but the fix is non-trivial and ETA is probably mid-to-late November.

Could you try running this code and see if it looks like the same crash you're getting in Crashlytics:

var numbers = new List<object>();
for (int i=0; i<60000; i++) {
  numbers.Add(i);
}
doc.SetAsync(new Dictionary<string, object>{{"numbers", numbers}});

You should see an error like this in the logcat:

JNI ERROR (app bug): global reference table overflow (max=51200)

which is the telltale sign that you've experienced this issue.

This crash is triggered by having documents with a large number of values (e.g. arrays of thousands of elements) or a large number of documents each with a large number of values (e.g. 60 documents each with an array of 100 elements). Given this information, does it sound like your use case could trigger this crash?

khambadkone commented 1 year ago

Thank you for the reply. Appreciate it.

Each Document is flat, but can have a large number of values. They are all key-value pairs only, however, which is a dump of some legacy JSON. e.g. lots of :

isXXX54Part1Verb:1
rvABCCount:0

Will try mimicking a large dummy document to confirm it breaks at 60000 dummy values.

dconeybe commented 1 year ago

Will try mimicking a large dummy document to confirm it breaks at 60000 dummy values.

Yes. Each key/value pair in a document uses 1 global ref. For map and array values, there is 1 global ref per entry. So calling DocumentReference.Set() with 51200 key/value pairs or with a single key/value pair whose value is a List of 51200 elements would trigger this crash.

To be clear, this is NOT a good thing; the number of global refs should not be growing linearly with the size of the documents. That is the bug I am going to fix.

khambadkone commented 1 year ago

Hi @dconeybe We did some tests ourselves, and find that the crash does not seem to depend only on the number of elements in the Document, but also on the periodicity of saves.

We have the following snippet we use to keep adding more keys to our Document, and find that it triggers a crash even at 12000 unique keys, provided we add 1000 keys at a time. AddKeys is a method that accepts a number of keys to add.

This differs from the behaviour you mentioned, so curious if this was a known issue fixed in a more recent version of Firestore, or if this is a new bug ?

public void AddKeys(InputField keysText)
    {
        int keys = Int32.Parse(keysText.text);
        const string s = "abcdhf_a";
        Debug.Log($"Adding {keys} keys");
        for (var i = 0; i < keys; i++)
        {
            int r = UnityEngine.Random.Range(0, 3);
            switch (r)
            {
                case 0:
                    GameManager.progressionData.SetInt(System.Guid.NewGuid().ToString(), 1);
                    break;
                case 1:
                    GameManager.progressionData.SetString(System.Guid.NewGuid().ToString(), s);
                    break;
                case 2:
                    GameManager.progressionData.SetFloat(System.Guid.NewGuid().ToString(), 5.5f);
                    break;
            }
        }
        keyCount.text = GameManager.progressionData.gameSaveData.Count.ToString();
        string data = JsonUtility.ToJson(GameManager.progressionData.gameSaveData);
        float size = System.Text.ASCIIEncoding.ASCII.GetByteCount(data)/ 1024f / 1024f;
        stringSize.text = size.ToString("n2") + "MB";
        SaveGameManager.Instance.Save(GameManager.progressionData.gameSaveData, true);
    }
dconeybe commented 1 year ago

@khambadkone Thank you for this extra info. Although 12,000 is significantly less than 51,200, maybe there ends up being some copies in memory that multiply the 12,000 to 24,000, and then you have two of them and, bam, you're at 48,000, very close to the 51,200 limit. So this could still be the global refs exhausted issue.

If you're able to reproduce the crash, could you capture and post a logcat? If you're concerned about posting the logcat publicly, you could email to to me at my GitHub username at google.com.

Would you be able to write a minimal app that reproduces the crash? If so, could you publish it to GitHub so I can reproduce the crash for myself?

There are no other known related issues that have been fixed recently. This still sounds like the global refs exhausted issue, but it's impossible to tell without a logcat.

Finally, you've reported this bug against SDK version 8.8.0, which is 9 months old. Back then we even used a different build system using Google internal build tools (now we use GitHub Actions runners to build). Could you try upgrading to the latest version (9.6.0 at the time of writing)? If we release a fix, you will need to upgrade anyways.

Hope this helps!

khambadkone commented 1 year ago
  1. We get the same crash on 9.6. So it's the global refs issue.
  2. My colleague will send a crashlog to you on your email address.
  3. The sample project would be one in Unity. Is that fine with you?

Regarding the problem per-se, can you suggest a workaround ? We keep setting the Document regularly, depending on the player's progress. So, if there is a large document, this will happen for players who engage.

The obvious solution is to prune the Document, but would like to know if you can suggest alternatives?

On Thu, Oct 13, 2022 at 8:20 PM Denver Coneybeare @.***> wrote:

@khambadkone https://github.com/khambadkone Thank you for this extra info. Although 12,000 is significantly less than 51,200, maybe there ends up being some copies in memory that multiply the 12,000 to 24,000, and then you have two of them and, bam, you're at 48,000, very close to the 51,200 limit. So this could still be the global refs exhausted issue.

If you're able to reproduce the crash, could you capture and post a logcat? If you're concerned about posting the logcat publicly, you could email to to me at my GitHub username at google.com.

Would you be able to write a minimal app that reproduces the crash? If so, could you publish it to GitHub so I can reproduce the crash for myself?

There are no other known related issues that have been fixed recently. This still sounds like the global refs exhausted issue, but it's impossible to tell without a logcat.

Finally, you've reported this bug against SDK version 8.8.0, which is 9 months old. Back then we even used a different build system using Google internal build tools (now we use GitHub Actions runners to build). Could you try upgrading to the latest version (9.6.0 at the time of writing)? If we release a fix, you will need to upgrade anyways.

Hope this helps!

— Reply to this email directly, view it on GitHub https://github.com/firebase/quickstart-unity/issues/1303#issuecomment-1277739646, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAP4WUT4T4AB5JU2ZSW6B33WDAOT5ANCNFSM6AAAAAARAZFRIU . You are receiving this because you were mentioned.Message ID: @.***>

dconeybe commented 1 year ago

Thank you for the logcat. That proves 100% that this is the global refs issue. There is no need to provide us any additional information. I've pasted two relevant excerpts from the logcat below:

JNI ERROR (app bug): global reference table overflow (max=51200)
global reference table dump:
Last 10 entries (of 51200):
  51199: 0x14483148 java.lang.Double
  51198: 0x143a93e0 java.lang.Double
  51197: 0x143b3b58 java.lang.Long
  51196: 0x143b3b68 java.lang.Long
  51195: 0x143b3b78 java.lang.String "abcdhf_a"
  51194: 0x143aa8d0 java.lang.String "abcdhf_a"
  51193: 0x143aa8e8 java.lang.Double
  51192: 0x143aa8f8 java.lang.Double
  51191: 0x143aa908 java.lang.String "abcdhf_a"
  51190: 0x143aa920 java.lang.String "abcdhf_a"
Summary:
  17080 of java.lang.Long (9019 unique instances)
  16733 of java.lang.Double (8849 unique instances)
  16272 of java.lang.String (8645 unique instances)
    749 of java.lang.Class (577 unique instances)

Here is the (cleaned up) stack trace:

FieldValue::FieldValue(FieldValue const&)
DocumentReferenceInternal::Set(const MapFieldValue&, const SetOptions&)
csharp::DocumentReferenceSet(DocumentReference&, FieldValue const&, const SetOptions&)
Firebase_Firestore_CSharp_DocumentReferenceSet()

We are working on a fix for this. The only workaround is to reduce the number of values in your documents, either by removing key/value pairs or removing elements from arrays and maps in the document(s). Each key/value pair consumes a global ref, each element of an array consumes a global ref, and each key/value pair in a nested map consumes a global ref.

khambadkone commented 1 year ago

@dconeybe Thank you for the help debugging this.

  1. Would using docRef.UpdateAsync with specific keys prevent this issue ?
  2. Basis the previous comment , how long are these references in memory? Are they maintained for the entire session, from the time Firebase is inited?
  3. When this crash happens, what happens to the existing Document on Firestore ?
dconeybe commented 1 year ago

@khambadkone

Calling DocumentReference.UpdateAsync() with only the keys you want to update would definitely be a workaround. That is especially true if the number of fields you would be updating is in the 10's or 100's.

The global references are in memory only for a short time (i.e. during the duration of the call to DocumentReference.SetAsync()).

When the crash happens the document will not get created/updated in Firestore. The crash happens before it sends the RPC to the server to create/update the document.

dconeybe commented 1 year ago

Update: The next Unity SDK release (scheduled for mid-November 2022) will pick up https://github.com/firebase/firebase-cpp-sdk/pull/1111, which should buy you some breathing room. It's not a full fix, but avoids doubling the number of global refs used by DocumentReference.SetAsync(). It could be good enough to fix your issue while we work on a full, proper fix.

dconeybe commented 1 year ago

@khambadkone I created a custom build of the Unity SDK that contains https://github.com/firebase/firebase-cpp-sdk/pull/1111. Could you try it out and see if it fixes your issue? Just download firebase_unity_sdk.zip from https://github.com/firebase/firebase-unity-sdk/actions/runs/3267710887. Note that this link will expire very soon, so grab it ASAP.

google-oss-bot commented 1 year ago

Hey @khambadkone. We need more information to resolve this issue but there hasn't been an update in 5 weekdays. I'm marking the issue as stale and if there are no new updates in the next 5 days I will close it automatically.

If you have more information that will help us get to the bottom of this, just add a comment!

google-oss-bot commented 1 year ago

Since there haven't been any recent updates here, I am going to close this issue.

@khambadkone if you're still experiencing this problem and want to continue the discussion just leave a comment here and we are happy to re-open this.

dconeybe commented 1 year ago

FYI the Firebase Unity SDK v10.1.0 was released on November 7, 2022 and includes https://github.com/firebase/firebase-cpp-sdk/pull/1111, which reduces the number of global references consumed by Firestore by about 50%. This will hopefully address many cases of the "global reference table overflow" crash. A full and proper fix is in development, but it is complex and is taking time to implement.