actions / runner-images

GitHub Actions runner images
MIT License
10.24k stars 3.08k forks source link

Inconsistent behavior of E2E testing on same MacOS image #11041

Open KangxuanYe opened 4 hours ago

KangxuanYe commented 4 hours ago

Description

Hi, our team is implementing iOS SDK and we need to run E2E testing for some reasons (simulators need to be launched). We are doing E2E testing in parallel on 2 simulators each time.

Recently, after we migrated our VM OS image from internal-macos12 to macOS-latest-internal/macOS-14-arm64, we are facing an issue that the same piece of code can have different behavior. Sometimes, every run/test can be completed successfully but sometimes, lots of E2E test cases fail randomly.

Those are the same thing on internal-macros12 and we can tell it is pretty stable when we are running E2E test on internal-macos12 image. However, we have no idea why it becomes so flaky now and the only differences we can observe is OS image version differences. On some versions we have table experience and on some versions, we don't.

Could you please explain or help investigate why same code will randomly have different behavior on CI pipeline/VM?

Platforms affected

Runner images affected

Image version and build link

So far, by our testing, we can say

image version: 20241108.422 is good image version: 20241022.361 is terrible image version: 20241119.505 is good to us.

Is it regression?

No

Expected behavior

Same piece of code should have the same behavior. Either a PR change should fail consistently or it should pass consistently. It can never be a situation that it sometimes pass and sometimes not.

Actual behavior

Any piece of code change or main branch will fail on E2E testing frequently and randomly on some OS image versions of macOS-14-arm64.

Repro steps

For some reasons, I may be not able to share our CI pipeline with you.

erik-bershel commented 2 hours ago

Hey @KangxuanYe!

I'm very sorry that you have to deal with such instability. 😞 But the information you provided is not enough to conduct an investigation and I can only guess. 🤷‍♂️ If you can't provide links to specific launches or the code of the pipelines themselves, then let's try to start from the error output - this information should not be closed, I believe.

If you're talking about problems with simulators, then perhaps researching the changelog of releases can help you or give you some ideas: 20241108.422, 20241022.361, 20241119.505