Questions related multiple baseline profile generator classes

hellosagar commented 1 month ago

Many apps are designed for single-use scenarios, where users download the app, complete their intended task, and then delete it. In such cases, every critical part of the user journey becomes even more important. For example, consider an app where the user signs up and then sees a home screen with a list of content. If I have a single baseline profile profile generator class that iterates through this journey 15 times, the first iteration will go through the signup process and scroll through the home screen. Since the user is already logged in in subsequent iterations, the baseline profile profile generator will start directly from the home screen, bypassing the onboarding flow. My question is: Are the baseline profile rules generated for the onboarding flow considered stable, given that they are only iterated once? Or would it be more effective to isolate the journeys into consistent, repeatable segments to create more stable profiles for each part of the journey?
If I have multiple profile generators classes, which one should I include the includeStartup parameter? Should it be the one where the user opens the app for the first time and accesses the login screen, or the scenario where the user is already logged in and opens the home screen directly? What factors should I consider when deciding which generator should have the includeStartup parameter set to true? Should this decision be based on the most frequently visited startup flow according to analytics events?
I’ve created two generators—HomeProfileGenerator and SignupProfileGenerator. When the HomeProfileGenerator runs, it performs a Firebase login, checks if the user is logged in, and then jumps directly to the home screen. However, when the SignupProfileGenerator runs, it starts the app from the home screen instead of the onboarding screen, causing the test to fail. Individually, both generators work fine. Is there a way to reset the app state before running each generator? I tried clearing the app state with ADB using the command pm clear $packageName, but this caused the app to close and didn’t work as expected. I also checked the issue tracker and found a similar issue but couldn’t find a solution. Here’s the link to the issue tracker: Google Issue Tracker. Do you have any suggestions for resetting the app state between different profile generators?

hellosagar commented 1 month ago

Additional question:

If an AAB or APK is distributed to app stores other than the Play Store, such as Amazon, Huawei, Samsung, and Xiaomi app stores, will the baseline profile still be adequately delivered with the application?

hellosagar commented 3 weeks ago

This question is going in another direction but in this repo, I ran the FullyDrawnStartupBenchmark and changed the code to test against the CompilationMode.partial with the baseline profile required. The results showed it was slower than the Compilation.none

I changed the source build variant to be benchmarkRelease for both modules

/*
 * Copyright 2022 The Android Open Source Project
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *      http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package com.example.macrobenchmark.benchmark.startup

import android.content.Intent
import androidx.benchmark.macro.BaselineProfileMode
import androidx.benchmark.macro.CompilationMode
import androidx.benchmark.macro.StartupMode
import androidx.benchmark.macro.StartupTimingMetric
import androidx.benchmark.macro.junit4.MacrobenchmarkRule
import androidx.test.ext.junit.runners.AndroidJUnit4
import androidx.test.filters.LargeTest
import androidx.test.uiautomator.By
import androidx.test.uiautomator.Until
import com.example.macrobenchmark.benchmark.util.DEFAULT_ITERATIONS
import com.example.macrobenchmark.benchmark.util.TARGET_PACKAGE
import org.junit.Rule
import org.junit.Test
import org.junit.runner.RunWith

@LargeTest
@RunWith(AndroidJUnit4::class)
class FullyDrawnStartupBenchmark {
    @get:Rule
    val benchmarkRule = MacrobenchmarkRule()

    @Test
    fun startupCompilationNone() =
        benchmark(CompilationMode.None())

    @Test
    fun startupCompilationBaselineProfiles() =
        benchmark(CompilationMode.Partial(BaselineProfileMode.Require))

    private fun benchmark(compilationMode: CompilationMode) =
        benchmarkRule.measureRepeated(
            packageName = TARGET_PACKAGE,
            metrics = listOf(StartupTimingMetric()),
            compilationMode = compilationMode,
            startupMode = StartupMode.COLD,
            iterations = DEFAULT_ITERATIONS,
        ) {
            val intent = Intent("$TARGET_PACKAGE.FULLY_DRAWN_STARTUP_ACTIVITY")

            // Waits for first rendered frame
            startActivityAndWait(intent)

            // Waits for an element that corresponds to fully drawn state
            device.wait(Until.hasObject(By.text("Fully Drawn")), 10_000)
        }
}

Altho, comparing the JIT thread pool from it looks idea. Less JIT thread pool work in compilation mode with partial with baseline profile

compilation mode with partial with baseline profile 👇🏻

Screenshot 2024-08-22 at 2 31 14 AM 1

compilation mode none 👇🏻

Screenshot 2024-08-22 at 2 30 12 AM 1

Do you mind helping out with this inconsistency in results?

keyboardsurfer commented 2 weeks ago

Many apps are designed for single-use scenarios, where users download the app, complete their intended task, and then delete it. In such cases, every critical part of the user journey becomes even more important. For example, consider an app where the user signs up and then sees a home screen with a list of content. If I have a single baseline profile profile generator class that iterates through this journey 15 times, the first iteration will go through the signup process and scroll through the home screen. Since the user is already logged in in subsequent iterations, the baseline profile profile generator will start directly from the home screen, bypassing the onboarding flow. My question is: Are the baseline profile rules generated for the onboarding flow considered stable, given that they are only iterated once? Or would it be more effective to isolate the journeys into consistent, repeatable segments to create more stable profiles for each part of the journey?

Are the baseline profile rules generated for the onboarding flow considered stable, given that they are only iterated once?

No, these are not considered stable.

Or would it be more effective to isolate the journeys into consistent, repeatable segments to create more stable profiles for each part of the journey?

Yes! You can run the login journey as part of the setupBlock for your post login flow. This will reproduce the behavior described above.

To get a baseline profile for the login flow itself you'll need to either not persist the login data, or clear it before the profile is collected. This will generate a stable baseline profile for the login flow itself.

If I have multiple profile generators classes, which one should I include the includeStartup parameter? Should it be the one where the user opens the app for the first time and accesses the login screen, or the scenario where the user is already logged in and opens the home screen directly? What factors should I consider when deciding which generator should have the includeStartup parameter set to true? Should this decision be based on the most frequently visited startup flow according to analytics events?

The resulting output of includeInStartupProfile is limited to a single dex file. Work your way from the journey that has the most users. For most apps that's opening the app from the launcher. If there's more space in the first dex file, go for the next app lanching journey. This could be a FCM triggered notification or settings activity.

I’ve created two generators—HomeProfileGenerator and SignupProfileGenerator. When the HomeProfileGenerator runs, it performs a Firebase login, checks if the user is logged in, and then jumps directly to the home screen. However, when the SignupProfileGenerator runs, it starts the app from the home screen instead of the onboarding screen, causing the test to fail. Individually, both generators work fine. Is there a way to reset the app state before running each generator? I tried clearing the app state with ADB using the command pm clear $packageName, but this caused the app to close and didn’t work as expected. I also checked the issue tracker and found a similar issue but couldn’t find a solution. Here’s the link to the issue tracker: Google Issue Tracker. Do you have any suggestions for resetting the app state between different profile generators?

I recommend running this on a rooted device or emulator with root access (userdebug build). Regular user builds might not clear caches as expected.

keyboardsurfer commented 2 weeks ago

If an AAB or APK is distributed to app stores other than the Play Store, such as Amazon, Huawei, Samsung, and Xiaomi app stores, will the baseline profile still be adequately delivered with the application?

Yes, that's what the profileinstaller library is for.

hellosagar commented 2 weeks ago

Thanks for answering the above questions ❤️

I recommend running this on a rooted device or emulator with root access (userdebug build). Regular user builds might not clear caches as expected.

I ran the baseline profile generator on the emulator with API 31, which has root access, still running the function below from the profileblock the app closes after executing the clear preference adb command

fun MacrobenchmarkScope.deletePreferencesViaADB() {
    val command = "pm clear $packageName"
    val output = device.executeShellCommand(command)
    Assert.assertEquals("Success", output)
}

Video:

https://github.com/user-attachments/assets/7b4108b5-9c04-40b1-a762-f6eb87d9b5ec

@keyboardsurfer Do you mind taking a look at this benchmarking results discrepancy issue as well - https://github.com/android/performance-samples/issues/286#issuecomment-2303332758

keyboardsurfer commented 2 weeks ago

The inconsistencies in https://github.com/android/performance-samples/issues/286#issuecomment-2303332758 can originate from a good handful of factors.

Given that the JIT threadpool is less active in the benchmark with baseline profile, it's working as intended.

The differences in TTID are below 50 ms for the median of the benchmarks. This fluctuation is very likely due to the device doing some unrelated work in the background.

The difference for reportFulllyDrawn is most likely due to the sample data class which delays when the app is ready.

keyboardsurfer commented 2 weeks ago

Can you share where you're calling deletePreferencesViaADB? It looks like it should work when called during the setupBlock

hellosagar commented 2 weeks ago

Can you share where you're calling deletePreferencesViaADB? It looks like it should work when called during the setupBlock

But there is no setupBlock when generating the basleine profile, it's only available during the benchrmarking process. That's why it is being ran as a part of the journey itself

To generate

 public fun collect(
        packageName: String,
        maxIterations: Int = 15,
        stableIterations: Int = 3,
        outputFilePrefix: String? = null,
        includeInStartupProfile: Boolean = false,
        strictStability: Boolean = false,
        filterPredicate: ((String) -> Boolean) = { true },
        profileBlock: MacrobenchmarkScope.() -> Unit
    )

To Benchmark

@JvmOverloads
    fun measureRepeated(
        packageName: String,
        metrics: List<Metric>,
        compilationMode: CompilationMode = CompilationMode.DEFAULT,
        startupMode: StartupMode? = null,
        @IntRange(from = 1) iterations: Int,
        setupBlock: MacrobenchmarkScope.() -> Unit = {},
        measureBlock: MacrobenchmarkScope.() -> Unit
    ) {

keyboardsurfer commented 2 weeks ago

Thanks for clarifying. You can't call this during the measure or profile blocks and expect reliable results. This needs to be done before the benchmark / profile generator kicks in. I don't have the solution on top of my head right now.

hellosagar commented 2 weeks ago

Thanks for clarifying. You can't call this during the measure or profile blocks and expect reliable results. This needs to be done before the benchmark / profile generator kicks in. I don't have the solution on top of my head right now.

Thanks for the response! Then, the only solution I can think of is to create a special variant of the app where preferences are not saved and use the profile to build the release APK.

keyboardsurfer commented 2 weeks ago

Found the time to verify an approach that works for clearing data between runs. Check out https://github.com/android/performance-samples/commit/09770d845b9cd63f92224f86b24e396973d7b8f1.

android / performance-samples

Questions related multiple baseline profile generator classes #286