cocos / cocos-engine

Cocos simplifies game creation and distribution with Cocos Creator, a free, open-source, cross-platform game engine. Empowering millions of developers to create high-performance, engaging 2D/3D games and instant web entertainment.
https://www.cocos.com/en/creator
Other
8.65k stars 2.01k forks source link

AGDK/GameActivity ANR Bug #17841

Open haroel opened 6 days ago

haroel commented 6 days ago

Cocos Creator version

3.6.x+

System information

Android

Issue description

Since version 3.6.x, CocosCreator has been using Google’s AGDK/GameActivity, which claims to significantly reduce the likelihood of application ANRs. Our company has developed numerous products based on the 3.7.x engine version, but even after releasing them on Google Play, the ANR rate remains high. I found that many ANRs are originating from GameActivity.onPauseNative (clearly related to switching between the foreground and background of the game). Further analysis revealed that the issue is related to the pthread_cond_timedwait function, which is used in android_native_app_glue.c.

static void android_app_set_activity_state(struct android_app* android_app,
                                           int8_t cmd) {
    pthread_mutex_lock(&android_app->mutex);
    android_app_write_cmd(android_app, cmd);
    while (android_app->activityState != cmd) {
        pthread_cond_wait(&android_app->cond, &android_app->mutex);
    }
    pthread_mutex_unlock(&android_app->mutex);
}

image image

I have also noticed that many game developers are encountering the same issue.

Relevant error log output

None

Steps to reproduce

To address this ANR, I spent some time understanding the internal workings and driving mechanisms of AGDK and eventually found a method to fix this ANR.

Here is my modified code.

android_native_app_glue.h

/** 【ADD】 Fix onPauseNative ANR (haroel/oeaheh@gmail.com) */
#define HH_STACK_SIZE 32
typedef struct HH_Stack {
    int top;
    int data[HH_STACK_SIZE];
    pthread_mutex_t lock;  // thread lock state
} HH_Stack;

struct android_app {
    ...

    /**【ADD】  Fix onPauseNative ANR (haroel/oeaheh@gmail.com) */
    struct HH_Stack *stateStack;
}

android_native_app_glue.c

/**【ADD】  Fix onPauseNative ANR (haroel/oeaheh@gmail.com) */
void HH_initializeStack(HH_Stack *stack) {
    stack->top = -1;
    pthread_mutex_init(&stack->lock, NULL);
}
bool HH_isFull(HH_Stack *stack) {
    return stack->top == HH_STACK_SIZE - 1;
}
bool HH_isEmpty(HH_Stack *stack) {
    return stack->top == -1;
}
void HH_push(HH_Stack *stack, int item) {
    pthread_mutex_lock(&stack->lock);
    if (HH_isFull(stack)) {
        LOGV("Stack is full. Unable to push the item.\n");
    } else {
        stack->data[++stack->top] = item;
    }
    pthread_mutex_unlock(&stack->lock);
}
int HH_pop(HH_Stack *stack) {
    pthread_mutex_lock(&stack->lock);
    int item;
    if (HH_isEmpty(stack)) {
        LOGV("Stack is empty. Unable to pop any item.\n");
        item = -1; // INT_MIN represents an error value, as the stack is empty.
    } else {
        item = stack->data[stack->top--];
    }
    pthread_mutex_unlock(&stack->lock);
    return item;
}
void HH_destroyStack(HH_Stack *stack) {
    pthread_mutex_destroy(&stack->lock);
}

static struct android_app* android_app_create(GameActivity* activity,
                                              void* savedState,
                                              size_t savedStateSize) {
    struct android_app* android_app = (struct android_app*)malloc(sizeof(struct android_app));
    memset(android_app, 0, sizeof(struct android_app));
    android_app->activity = activity;
    {
        /** 【ADD】 Fix onPauseNative ANR (haroel/oeaheh@gmail.com) */
        android_app->stateStack = (struct HH_Stack*)malloc(sizeof(struct HH_Stack));
        memset(android_app->stateStack, 0, sizeof(struct HH_Stack));
        HH_initializeStack(android_app->stateStack);
    }
    ...
}

static void android_app_destroy(struct android_app* android_app) {
    ...
    /**【ADD】  Fix onPauseNative ANR (haroel/oeaheh@gmail.com) */
    HH_destroyStack(android_app->stateStack);
}

// Note here:
static void process_cmd(struct android_app* app,
                        struct android_poll_source* source) {
    /** 【ADD】 Fix onPauseNative ANR (haroel/oeaheh@gmail.com)        
        If there are unprocessed states in the stack, prioritize handling them. */
    int val = HH_pop(app->stateStack);
    if (val >= 0){
        if (app->onAppCmd != NULL) {
            app->onAppCmd(app, val);
        }
        return;
    }
    ...
}
// Note here:
static void onPause(GameActivity* activity) {
    /**【ADD】  Fix onPauseNative ANR (haroel/oeaheh@gmail.com) */
    struct HH_Stack *stateStack = ToApp(activity)->stateStack;
    HH_push(stateStack, APP_CMD_PAUSE);
    LOGV("Pause: %p", activity);
//    android_app_set_activity_state(ToApp(activity), APP_CMD_PAUSE);  
}

This solution has been validated on our products, and the ANR rate for most of our products has reduced to one-third of the original rate.

image

I also reported this bug to Google, and they have responded. Here is the IssueTracker link, feel free to check it out if you are interested. https://issuetracker.google.com/issues/377940980

I hope Cocos can merge this code into the 3.8.5 release version. Thank you very much.

(之前给引擎提过PR,有点太麻烦了,所以用提Issues方式提交这次修改,修改非常简单,只涉及两个c文件,其他开发者有兴趣也可以在自己的版本中合并修改)

Minimal reproduction project

None

zhitaocai commented 21 hours ago

it looks great.