couchbase / couchbase-lite-android-ce

The community edition of couchbase lite for android
Apache License 2.0
9 stars 1 forks source link

Crash at at com.couchbase.lite.AbstractDatabaseConfiguration.setTempDir + 104 #31

Closed benjaminglatzeder closed 4 years ago

benjaminglatzeder commented 4 years ago

Setup

CBL Android CE 2.6

Stacktrace

Caused by java.lang.NullPointerException: Attempt to invoke virtual method 'boolean java.lang.String.equals(java.lang.Object)' on a null object reference
       at com.couchbase.lite.AbstractDatabaseConfiguration.setTempDir + 104(AbstractDatabaseConfiguration.java:104)
       at com.couchbase.lite.AbstractDatabase.<init> + 271(AbstractDatabase.java:271)
       at com.couchbase.lite.Database.<init> + 60(Database.java:60)

Code executed before crash

DatabaseConfiguration configuration = new DatabaseConfiguration();
database = new Database(DATABASE_NAME, configuration); // <-- last line called in app code

How many users are affected

Less than 0.1%. Devices affected are WIKO, Acer, Sony, HUAWEI, LGE, motorola. Android versions affected are from Android 5 to Android 9.

Reproducible?

No.

bmeike commented 4 years ago

You need to call CouchbaseLite.init(Context) before you create a Database instance. This is an API change.

benjaminglatzeder commented 4 years ago

In my app's case this is called in the Application class and thus is called before any database operations take place. Not calling init would be immediately apparent in testing. It would crash as soon as the first database operation is run. So in my app's case in the first second of opening the app. Also as stated this issue only appears for a small group of users who are all on the latest app version.

There are other issues popping up in my issue tracker, e.g.

Caused by java.lang.IllegalArgumentException: database cannot be null at com.couchbase.lite.internal.utils.Preconditions.checkArgNotNull + 25(Preconditions.java:25) ...

and

Fatal Exception: java.lang.IllegalArgumentException: Failed to obtain content from BlobStore. digest=sha1-/8L4ET2P2c8BhyA9MTYTmbr8NIo= (through reference chain: java.util.HashMap["_attachments"]->java.util.HashMap["1561299038112"]->com.couchbase.lite.Blob["content"]) at com.fasterxml.jackson.databind.ObjectMapper._convert + 3750(ObjectMapper.java:3750) ...

But I don't know if it makes sense to report them as they only hit a small part of the userbase and I'm unable to reproduce them.

bmeike commented 4 years ago

The last one (Blob error) is probably due to a sync problem. Blobs are stored a files in the local file system and the file name is stored in the DB. When you reference a blob, CBL gets the file name from the DB, and then attempts to open the file. I believe that that error occurs when the file does not exist.

Do you have a ContentProvider in your app? A content provider can run without starting the Application context. Also, if you are initializing a static reference somewhere, the initialization code may be run before the Application starts. Both of those errors are indications that CouchbaseLite.init has not been called, when the code is executed.

I will check the tags in the CE repo. There was a bug like this, in this area, but I thought it had been fixed quite a while ago. I will verify and update

bmeike commented 4 years ago

After examining the code, the only way that I can see that the exception you are seeing, could happen, with a recent version of CE, is if Context.getExternalFilesDir() returns null. I suppose this is possible on some devices...

benjaminglatzeder commented 4 years ago

About the blob crashes: it appeared that the blobs existed on Couchbase server 6.0 but not locally on the device. Deleting the app and resyncing solved the issue. This may have happened after upgrading from CBL 1.x to 2.x. Or how you described it.

No content provider. Not sure about static reference, so I suppose I don't use it. I believe the next CBL release (CBL 2.7?) will be available soon. Then I'll release an update with it and keep an eye on the crash reports. Thanks!

benjaminglatzeder commented 4 years ago

Following up with having CBL 2.7 in production for the last couple of weeks. The crash in setTempDir is still appearing frequently.

Full stacktrace with all threads can be found in this gist

Affected Android versions are 6, 7, 8, 9. No issues were reported from Android 5 or 10 in the last 30 days. The reports come from a wide range of manufacturers (e.g. Samsung, Huawei, Sony, etc.) I counted about 20 manufacturers. That does not include phone models, e.g. Samsung Galaxy S10, Samsung Galaxy S10e, etc.

I'm clueless if the issue disappears if the user reopens the app after the crash occured of if the app keeps crashing.

bmeike commented 4 years ago

After a quick inspection, I don't see how that happens, either. The failure is here:

    void setTempDir() {
        synchronized (AbstractDatabaseConfiguration.class) {
            final String tempDir = getTempDir();
            if (!tempDir.equals(AbstractDatabaseConfiguration.tempDir)) {
                AbstractDatabaseConfiguration.tempDir = tempDir;
                C4Base.setTempDir(AbstractDatabaseConfiguration.tempDir);
            }
        }
    }

getTempDir() is returning null. I don't see how that can happen.

Looking at the trace, it appears that you are trying to initialize the database very very early in the application lifecycle. Is it possible that you either have not called CouchbaseLite.init() yet, or that you called it with an incompletely initialized Android Context?

benjaminglatzeder commented 4 years ago

Here's the onCreate() method from the application class:

@Override
public void onCreate() {
    super.onCreate();
    singleton = this;
    Fabric.with(this, new Crashlytics());
    CouchbaseUtil.getInstance(this);

/* -------------------- */
/* CouchbaseUtil class */
private static CouchbaseUtil instance;
/* ... */ 

private CouchbaseUtil() {
}

public static CouchbaseUtil getInstance(Context context) {
    /* database().getPath() returns null if database is closed. */
    if (CouchbaseUtil.instance == null || instance.getDatabase() == null || instance.getDatabase().getPath() == null) {
        CouchbaseUtil.instance = new CouchbaseUtil();
        CouchbaseLite.init(context);
        // Database.log.setCustom(new LogTestLogger(LogLevel.VERBOSE));
        openDatabase();
        createHelperInstance();
    }
    return CouchbaseUtil.instance;
}

I do initialize Couchbase very early. Just after the bug tracker Crashlytics. I must assume that once onCreate is entered the context object is initiazlized. I never heard that the context object can either be:

I'm hoping that my init Couchbase code is wrong. I can also push the call to init Couchbase at the end of the onCreate method.

bmeike commented 4 years ago

I don't see anything that should fail, there.

It's a bit odd to call init() just because there is no db path. init() is idempotent, though, so it shouldn't be any kind of problem.

... it also seems kind of odd to go to all the trouble of lazily initializing something, when you explicitly create it in Application.onCreate.

I have a guess: Try making CouchbaseUtil.instance volatile. It seems quite likely that you are using the CouchbaseUtil singleton from several threads. It is not anything like thread safe. Making the static variable volatile will at least guarantee that if someone gets a non-null ref, that that ref has been completely initialized.

I'm not sure that will do it, but it might.

If that doesn't work, you are probably going to have to put most of that initialization code into the CouchbaseUtil object and either make it all final, or synchronize access to it.

benjaminglatzeder commented 4 years ago

I will make CouchbaseUtil.instance volatile and see how it goes.

I also had a look at the getTempDir() method and the methods called by it. Below is the relevant code:

/**
* Returns the temp directory. The default temp directory is APP_CACHE_DIR/Couchbase/tmp.
* If a custom database directory is set, the temp directory will be
* CUSTOM_DATABASE_DIR/Couchbase/tmp.
*/
private String getTempDir() {
    return (!customDir)
            ? CouchbaseLite.getTmpDirectory(TEMP_DIR_NAME)
            : CouchbaseLite.getTmpDirectory(directory, TEMP_DIR_NAME);
    }

/* ---------------------------------------- */
/* --- com.couchbase.lite.CouchbaseLite --- */

static String getTmpDirectory(@NonNull String name) {
    requireInit("Temp directory not initialized");
    return verifyDir(getContext().getExternalFilesDir(name));
}

static void requireInit(String message) {
    if (!INITIALIZED.get()) {
        throw new IllegalStateException(message + ".  Did you forget to call CouchbaseLite.init()?");
    }
}

@Nullable
private static String verifyDir(@Nullable File dir) {
    if (dir == null) { return null; }
    final String path = dir.getAbsolutePath();
    if ((dir.exists() && dir.isDirectory()) || dir.mkdirs()) { return path; }
    throw new IllegalStateException("Cannot create or access directory at " + path);
}

I understand it like so:

Since I see a NullPointerException in my stacktrace above I must assume that the File dir was null and the returned String was null and the app will fall over.

final String tempDir = getTempDir();
if (!tempDir.equals(AbstractDatabaseConfiguration.tempDir))

The Android SDK method may also return null:

/* --- android.content.Context --- */
@Nullable
public abstract File getExternalFilesDir(@Nullable String type);

It might be possible that the app is intalled on the device's internal storage but the database is stored on the SD-card and the SD-card is not in the device at that time. As far as I know I don't specify a directory and then Couchbase Lite should use the default path to create the database. And this path should be in the internal storage, correct? I checked the issue reports again and no one rooted their device so I must believe no one was able to move the database to external storage. Also to be of note is that 90% of crashes occured when the app was in the background. I did add a Worker (Android) to create an automatic backup. The worker uses the Couchbase database. Following the code I don't see that CouchbaseUtil.getInstance() is called once. I must assume that the Application class and its onCreate() method are run and Couchbase is initialized. If not I'd expect a different exception. In any case I'll test this shortly!

benjaminglatzeder commented 4 years ago

I'll test this shortly!

Yes, if WorkManager/Worker (Android) is run (and the app was killed before, e.g. it is not in previously used apps) the Application class is initialized, too.

bmeike commented 4 years ago

I know that code pretty well. I wrote it.

The Android SDK method may also return null: @Nullable public abstract File getExternalFilesDir(@Nullable String type);

You are correct. It could return null. If it does, there is no external files dir and your app will fail no matter what you do. At this time, CouchbaseLite requires temporary storage. That is why there is no null check there. This subject is under discussion.

It might be possible that the app is installed on the device's internal storage but the database is stored on the SD-card and the SD-card is not in the device at that time. As far as I know I don't specify a directory and then Couchbase Lite should use the default path to create the database. And this path should be in the internal storage, correct?

Yes.

This has nothing to do with the db itself. CouchbaseLite Core requires temporary storage for various operations. That storage is initialized on DB creation. As soon as we have the chance to change the API (v 3.x) I intend to change that so that it is initialized in CouchbaseLite.init

I checked the issue reports again and no one rooted their device so I must believe no one was able to move the database to external storage.

Again, nothing to do with the DB. The problem is the initialization of the temp directory that happens when the Database constructor tries to initialize all of the directories that Core will need, to manage a database.

Also to be of note is that 90% of crashes occured when the app was in the background. I did add a Worker (Android) to create an automatic backup. The worker uses the Couchbase database. Following the code I don't see that CouchbaseUtil.getInstance() is called once. I must assume that the Application class and its onCreate() method are run and Couchbase is initialized. If not I'd expect a different exception. In any case I'll test this shortly!

By "background" you mean when the no app screen is visible? You are doing a whole bunch of "assuming" there! The Worker does have access to a Context, but it may not be the context that you stored into singleton.

I think I see your problem...

benjaminglatzeder commented 4 years ago

By "background" you mean when the no app screen is visible?

Yes!

I added a bunch of logs and checked in what order the application class and worker class are called. The Application class is initialized before the Worker class code is run. Since I initialize Couchbase in onCreate method of the Application class there should not be a problem.

The Worker does have access to a Context, but it may not be the context that you stored into singleton

I don't quite understand. Here's the worker class:

public class WorkerWeeklyBackupUploader extends Worker {

    private final Context context;

    public WorkerWeeklyBackupUploader(@NonNull Context context, @NonNull WorkerParameters params) {
        super(context, params);
        this.context = context;
    }

The context should not be null as it's @NonNull in the constructor.

Let's not forget that the stacktraces only show the Application class. Sorry about the tangent about the Worker. Another clue why it might not be the Worker is that the issues appeared at least one month before I added this Worker.

If this is a dead-end I'm happy to wait for version 3.

bmeike commented 4 years ago

The method getInstance lazily initializes the static variable instance with an instance of CouchbaseUtil that refers to a Context. A new instance of Application will see instance already initialized and will not re-initialize it. instance now points to an instance of CouchbaseUtil that refers to a dead Context.

This could easily happen when a Worker is run. I am less certain about what happens when an app comes back from "background".

benjaminglatzeder commented 4 years ago

This could be true. I thought that once the Application class was destroyed any other objects are gone, too. Since the onCreate method of the Application is only run once I can put the init code there and initialize the CouchbaseUtil class without this line of code.

@Override
public void onCreate() {
    super.onCreate();
    singleton = this;
    Fabric.with(this, new Crashlytics());
    /* this line was added and removed from CouchbaseUtil */
    CouchbaseLite.init(this);
    CouchbaseUtil.getInstance(this);

Further I could check if the instance is not null and destroy it beforehand like so

    CouchbaseUtil.destroyIfNotNull();
    /* this line was added and removed from CouchbaseUtil */
    CouchbaseLite.init(this);
    CouchbaseUtil.getInstance(this);
bmeike commented 4 years ago

instance Application != instance of Application.class statics have the life-cycle of the latter.

bmeike commented 4 years ago

Whoops. I closed that by mistake. I'm going to leave it closed, but I'm perfectly happy to re-open it, if there is a reason to do so.

I do suspect that we now have the root of your issue...

benjaminglatzeder commented 4 years ago

Finally a user got in touch via email and it looks like they are unable to reopen the app. A device reboot did not help. The issue persists. As much as I tried to write the correct code and initialize CBL the right way the issue has been popping up daily in my crash reporter for months now.

Are there any news on this issue?

bmeike commented 4 years ago

Hey @benjaminglatzeder ...
Can you get me a new stack trace from the user's failure? I will try to suggest some ways to approach debugging this. You also might want to give the latest 2.8 code a try. There's been a lot of reworking of that code. CE 2.8 is here: https://github.com/couchbase/couchbase-lite-java-ce-root

benjaminglatzeder commented 4 years ago

Setup: implementation 'com.couchbase.lite:couchbase-lite-android:2.7.1' Here's a new stacktrace:

Caused by java.lang.NullPointerException: Attempt to invoke virtual method 'boolean java.lang.String.equals(java.lang.Object)' on a null object reference
       at com.couchbase.lite.AbstractDatabaseConfiguration.setTempDir(AbstractDatabaseConfiguration.java:104)
       at com.couchbase.lite.AbstractDatabase.<init>(AbstractDatabase.java:279)
       at com.couchbase.lite.Database.<init>(Database.java:73)

I was able to clone the repo but failed at building it. Maybe it's because I'm on Windows. Last time I got hold of a Mac developer and they were able to build it. The build output is here: https://gist.github.com/benjaminglatzeder/a5d6059b88d357e36848e082c092daa3 I'm happy to open another issue for building the CBL lib if you'd like.

bmeike commented 4 years ago

@benjaminglatzeder Yeah... that's probably a good idea. Open new issue for the Windows build. We don't get the opportunity to test our builds on Windows, much. Sorry about that.

For the base issue, please try this: Create a DatabaseConfiguration for the Database. Call getContext().getExternalFilesDir() and verify that the result is not null. Call DatabaseConfiguration.setDirectory with the non-null result of the above.

I will be interested to hear how this works. I will be most interested in hearing if the call to getExternalFilesDir returns null.

benjaminglatzeder commented 4 years ago

User sent an email stating that the external SD-card was broken. The device is running Android 6 with a minimum of 8GB internal storage. So an SD-card makes sense in 2020. In the Android manifest install location is not set and thus it should not be possible to move the app to external storage. I'm afraid I won't be able to provide further logs from the user. They got their data back via the sync feature. As I said this issue pops up daily in my issue tracker. I wouldn't think so many people use SD-cards as their apps' storage. I might try-catch the exception and display a notification and ask users to get in touch. Or maybe CBL 2.8 will make the issue disappear.

bmeike commented 4 years ago

I don't think that 2.8 will make this particular problem disappear. At this time, CBL-Android depends on a scratch directory. It uses Context.getExternalFilesDir() to create one. It's going to do that as long as it needs temp files. Perhaps I can improve the error message...

In order to work around this, you might call this method yourself. If it returns null, you could try to use some other directory and pass its path to the constructor of DatabaseConfiguration or, at least, give some reasonable failure message to the end user. Be aware though, FWIW, that if you specify a directory to the config, it will be used for both the db files as well as as the root of the temp tree.

bmeike commented 4 years ago

I have added an error message, to the 2.8 release, that will clearly mark a failure caused by the absence of a scratch directory.

I'll leave this ticket open for a little in case there is more that I can do. The tl;dr, though, is that: 1) there must be a temp directory 2) CBL code creates the temp directory using Context.getExternalFilesDir(). If that fails, so will CBL 3) You can replace the default temp directory code by passing a directory path to DatabaseConfiguration. That directory can be the result of calling Context.getExternalFilesDir(). The path passed to the config will be the root for all CBL files: both the scratch directory and the database directory

bmeike commented 4 years ago

BTW... the path length thing is, again, for-real and not something we can do much about. The source tree has been reorged for 2.8 and may provide some relief. @borrrden , our Windows expert, suggests this: https://www.howtogeek.com/266621/how-to-make-windows-10-accept-file-paths-over-260-characters/

benjaminglatzeder commented 4 years ago

Thank you for all the info and guidance. I'll try this code

if (getExternalFilesDir("CouchbaseLiteTemp") == null) {
    // TODO: display error message
}

in the next release. I also set the android:installLocation="internalOnly" in AndroidManifest.xml. It'll be some time until the update is fully released and I have been able to gather enough data. I'll keep you posted!

bmeike commented 4 years ago

Ok @benjaminglatzeder . I'm going to close this issue, then. Please open new ones, for new problems.